The Second Wave


thammond – 2007 September 11

In Metadata

You might have been wondering why I’ve been banging on about XMP here. Why the emphasis on one vendor technology on a blog focussed on an industry linking solution? Well, this post is an attempt to answer that.

Four years ago we at Nature Publishing Group, along with a select few early adopters, started up our RSS news feeds. We chose to use RSS 1.0 as the platform of choice which allowed us to embed a rich metadata term set using multiple schemas - especially Dublin Core and PRISM. We evangelized this much at the time and published documents on (Jul. ’03) and in D-Lib Magazine (Dec. ’04) as well as speaking about this at various meetings and blogging about it. Since that time many more publishers have come on board and now provide RSS routinely, many of them choosing to enrich their feeds with metadata.

Well, RSS can be seen in hindsight as being the First Wave of projecting a web presence beyond the content platform using standard markup formats. With this embedded metadata a publisher can expand their web footprint and allow users to link back to their content server.

Now, XMP with its potential for embedding metadata in rich media can be seen as a Second Wave. Media assets distributed over the network can now carry along their own metadata and identity which can be leveraged by third-party applications to provide interesting new functionalities and link-back capability. Again a projection of web presence.


XMP - Some Other Gripes


thammond – 2007 September 10


Following on from the missing XMP Specification version number discussed in the previous post here below are listed some miscellaneous gripes I’ve got with XMP (on what otherwise is a very promising technology). I would be more than happy to be proved wrong on any of these points.

connecting things: bioGUID, iSpiders and DOI

Ed Pentz

Ed Pentz – 2007 September 07

In Interoperability

David Shorthouse and Rod Page have developed some great tools for linking references by tying together a number of services and using the Crossref OpenURL interface amongst other things. See David’s post - Gimme That Scientific Paper Part III and Rod’s post on OpenURL and using ParaTools - “OpenURL and Spiders“.

Unfortunately our planned changes to the Crossref OpenURL interface (the 100 queries per day limit in particular) caused some concern for David (“Crossref Takes a Step Back“) - but make sure you read the comments to see my response!

We decided to drop the 100 per day query limit for the OpenURL interface and there will be no charges for non-commercial use of the interface -

Stop Press


thammond – 2007 August 28

In Metadata

Boy, was I ever so wrong! Contrary to what I said in yesterday’s post, the new PRISM 2.0 spec does support XMP value type mappings for its terms. See the table below which lists the PRISM basic vocabulary terms and the XMP value types.

Many thanks to Dianne Kennedy and the rest of the PRISM Working Group for having added this support to PRISM 2.0.


thammond – 2007 August 23


Following on from yesterday’s post I just came across this very useful source of information on PDF/A: the PDF/A Conformance Center. This provides links to resources such as this whitepaper PDF/A - A new Standard for Long-Term Archiving, and a number of technical notes, especially Metadata and PDF/A-1(also available as a PDF). (This latter corrects some errors in the ISO standard which are to be redressed in a forthcoming Technical Corrigendum later this year.

Weird Scenes Inside the Gold Mine


thammond – 2007 August 22

In Metadata

So, following up on my recent posts here on Metadata in PDFs (Strategies, Use Cases, Deployment), I finally came across PDF/A and PDF/X, two ISO standardized subsets of PDF. the former (ISO 19005-1:2005) for archiving and the latter (ISO 15929:2002, ISO 15930-1:2001, etc.) for prepress digital data exchange.

Both formats share some common ground such as minimizing surprises between producer and consumer and keeping things open and predictable. But my interest here is specifically in metadata and to see what guidance these standards might provide us. Not unsurprisingly, metadata is a key issue for PDF/A, less so for PDF/X. I’ll discuss PDF/X briefly but the bulk of this post is focussed on PDF/A. See below.

New SRU (1.2) Website


thammond – 2007 August 08

In Search

From Ray Denenberg’s post to the SRU Listserv yesterday: _“The new SRU web site is now up: It is completely reorganized and reflects the version 1.2 specifications. (It also includes version 1.1 specifications, but is oriented to version 1.2.) … There is an official 1.1 archive under the new site, And note also, that the new spec incorporates both version 1.1 and 1.2 (anything specific to version 1.1 is annotated as such).

Handle Plugin: Some Notes


thammond – 2007 August 02

In Linking

The first thing to note is that this demo (the Acrobat plugin) is an application. And that comes with its own baggage, i.e. this is a Windows only plugin and is targeted at Acrobat Reader 8. On a wider purview the application merely bridges an identifier embedded in the media file and the handle record filed against that identifier and delivers some relevant functionality. The data (or metadata) declared in the PDF and in the associated handle if rich enough and structured openly can also be used by other applications. I think this is a key point worth bearing in mind, that the demo besides showing off new functionalities is also demonstrating how data (or metadata) can be embedded at the respective endpoints (PDF, handle).

Some initial observations follow below.

Metadata in PDF: 3. Deployment


thammond – 2007 August 02

In Metadata

So, assuming we know the form of the metadata we wish to add to our PDFs (or else to comply with if there is already a set of guidelines, or some industry initiative in effect) how can we realize this? And, on the flip side, how can we make it easier for consumers to extract metadata we have embedded in our PDFs.

Below are some considerations on deploying metadata in PDFs and consumer access.



thammond – 2007 August 02

In Metadata

Only just caught up with this but the PRISM 2.0 draft is now available (since July 12) for public comment. See this posted by Dianne Kennedy: _“Just a note to let you know that PRISM 2.0 has just been posted at . This is the first major revision to PRISM. We have incorporated new elements to support online content and have expanded and revised our controlled vocabularies. In addition we have added a profile to support PRISM in an XMP environment.