Flies in your metadata (ointment)

8 minute read.

Flies in your metadata (ointment)

Isaac Farley, Amanda Bartell, Shayn Smulyan, Paul Davis, Arley Soto – 2022 July 25

In MetadataContent RegistrationResearch Nexus

Quality metadata is foundational to the research nexus and all Crossref services. When inaccuracies creep in, these create problems that get compounded down the line. No wonder that reports of metadata errors from authors, members, and other metadata users are some of the most common messages we receive into the technical support team (we encourage you to continue to report these metadata errors).

We make members’ metadata openly available via our APIs, which means people and machines can incorporate it into their research tools and services - thus, we all want it to be accurate. Manuscript tracking services, search services, bibliographic management software, library systems, author profiling tools, specialist subject databases, scholarly sharing networks - all of these (and more) incorporate scholarly metadata into their software and services. They use our APIs to help them get the most complete, up-to-date set of metadata from all of our publisher members. And of course, members themselves are able to use our free APIs too (and often do; our members account for the vast majority of overall metadata usage).

Metadata users and uses: metadata from Crossref APIs is used for a variety of purposes by many tools and services

We know many organizations use Crossref metadata. We highlighted several different examples in our API case study blog series and user stories. Now, consider how errors could be (and often are) amplified throughout the whole research ecosystem.

While many inaccuracies in the metadata have clear consequences (e.g., if an author’s name is misspelled or their ORCID iD is registered with a typo, the ability to credit the author with their work can be compromised), there are others, like this example of typos in the publication date, that may seem subtle, but also have repercussions. When we receive reports of metadata quality inaccuracies, we review the claims and work to connect metadata users with our members to investigate and then correct those inaccuracies.

Thus, while Crossref does not update, edit, or correct publisher-provided metadata directly, we do work to enrich and improve the scholarly record, a goal we’re always striving for. Let’s look at a few common examples and how to avoid them.

Pagination faux pas

First page marked as 1

In the XML registered

<pages>
<first_page>1</first_page>
<last_page>1</last_page>
</pages>

https://0-api-crossref-org.pugwash.lib.warwick.ac.uk/works?filter=type:journal-article&select=DOI,title,issue,page&sample=100

Other pagination errors

In the XML registered

<item_number item_number_type="article-number">1</item_number>

In the XML registered

<pages>
<first_page>121-123</first_page>
<last_page>129</last_page>
</pages>

Author naming lapses

Examples: Titles (Dr., Prof. etc.) in the given_name field; Suffixes (Jr., III, etc.) in the surname field; superscript number, asterisk, or dagger after author names (usually carried over from website formatting that references affiliations); full name in surname field

In the XML registered

<contributors>
<person_name sequence="first" contributor_role="author">
<given_name>DOCTOR KATHRYN</given_name>
<surname>RAILLY</surname>
</person_name>
<person_name sequence="additional" contributor_role="author">
<given_name>DOCTOR JOSIAH S.</given_name>
<surname>CARBERRY</surname>
</person_name>
</contributors>

<contributors>
<person_name contributor_role="author" sequence="first">
<surname>Mahmoud Rizk</surname>
</person_name>
<person_name contributor_role="author" sequence="additional">
<surname>Asta L Andersen(</surname>
</person_name>
</contributors>

Organizations as authors slip-ups

Examples: The contributor role for person names is for persons, not organizational contributors, but we see this violated from time to time. Unfortunately, no persons are being credited with contributing to content that have these errors present in the metadata record.

In the XML registered

<contributors>
<person_name sequence="first" contributor_role="author">
<surname>Society</surname>
</person_name>
</contributors>

<person_name contributor_role="author" sequence="first">
<given_name>University of Melbourne</given_name>
<surname>University of Melbourne</surname>
</person_name>
</contributors>

Null no-nos

Examples: Too many times we see “N/A”, “null”, “none” in various fields (pages, authors, volume/issue numbers, titles, etc.). If you don’t have or know the metadata, it’s better to omit it for optional metadata elements than to include inaccuracies in the metadata record.

In the XML registered

<journal_volume>
<volume>null</volume>

<pages>
<first_page>null</first_page>
<last_page>null</last_page>
</pages>

<person_name sequence="first" contributor_role="author">
<given_name>Not Available</given_name>
<surname>Not Available</surname>
</person_name>
<person_name sequence="additional" contributor_role="author">
<given_name>Not Available</given_name>
<surname>Not Available</surname>
</person_name>

Where to go from here?

One thing we’ve said throughout this blog that we’ll reiterate here is: accurate metadata is important. It’s important in itself, and the metadata registered with us is heavily used by many systems and services, so think Crossref and beyond. In addition to that expanding perspective, there are practical steps members and metadata users can take to help us:

As a member registering metadata with us:

make sure we have a current metadata quality contact for your account and update us if there’s a change
if you receive an email request from us to investigate a potential metadata error, help us
if you do not know what to enter into a metadata element or helper tool field, please leave it blank; perhaps some of the examples of errors within this blog were placeholders that the responsible members intended to come back to - to correct in time; that’s also a practice to avoid
if you find a record in need of an update, update it - updates to existing records are always free (we do this to encourage updates and the resulting accurate, rich metadata, so take advantage of it).

As a metadata user:

if you spot a metadata record that doesn’t seem right, let us know with an email to support@crossref.org and/or report it to the member responsible for maintaining the metadata record (if you have a good contact there)
if you’re eager to confirm the last update of a metadata record, our REST API is a great resource; here’s a handy query to use as a starting point: this one returns records on our Crossref prefix 10.5555 that have been updated in 2022: https://0-api-crossref-org.pugwash.lib.warwick.ac.uk/prefixes/10.5555/works?rows=500&filter=from-update-date:2022-01-01,until-pub-date:2022-12-31&mailto=support@crossref.org

Making connections between research objects is critical, and inaccurate metadata complicates that process. We’re continually working to better understand this, too. That’s why we’re currently researching the reach and effects of metadata. Our technical support team is always eager to assist in correcting errors. We’re also keen on avoiding those mistakes altogether, so if you are uncertain about a metadata element or have questions about anything included in this blog post, please do contact us at support@crossref.org. Or, better yet, post your question in the community forum so all members and users can benefit from the exchange. If you have a question, chances are others do as well.

Get involved

Find a service

Documentation

About us

2024 April 26

This year's call for expressions of interest to join our board

2024 April 24

Common views and questions about metadata across Africa

2024 April 03

Testing times

2024 March 18

Mending Chesterton's Fence: Open Source Decision-making

Blog