Important changes to Similarity Check

5 minute read.

Important changes to Similarity Check

Madeleine Watson – 2016 October 21

In Full-Text LinksMember BriefingMetadataSimilarity Check

New features, new indexing, new name - oh my!

TL;DR The indexing of Similarity Check users’ content into the shared full-text database is about to get a lot faster. Now we need members assistance in helping Turnitin (the company who own and operate the iThenticate plagiarism checking tool) to transition to a new method of indexing content.

For existing Similarity Check users: please check that your metadata includes full-text URLs so that Turnitin can quickly and easily locate and index your content. Full-text URLs need to be included in 90% of journal article metadata by 31st December 2016.

2016 has seen some exciting new developments

(And there are plenty more in store as we strive towards 2017). But first: in April we renamed the service from CrossCheck to Similarity Check and we now have a new service logo available to reference via our logo CDN using the following code.

<img src="https://0-assets-crossref-org.pugwash.lib.warwick.ac.uk/logo/crossref-similarity-check-logo-200.svg" width="200" height="98" alt="Crossref Similarity Check logo">

Earlier this year Crossref also signed a new contract with Turnitin. As part of this, we negotiated the inclusion of dedicated development time each year from Turnitin’s engineering and product teams to focus on developments in the iThenticate tool that will specifically support Similarity Check users and their needs. Many of our members will have been contacted recently by Turnitin and asked to complete a survey regarding how they use the tool and what improvements they would like to see made in the future. The results of this survey are currently being analyzed and will be used by Turnitin to inform a development plan.

Finally, throughout 2016 we have also been working with Turnitin to help them develop a new Content Intake System that provides a faster, more reliable and robust method for collecting data from Crossref and indexing users’ content into the Similarity Check full-text database. Previously Turnitin was only able to collect prefix data from Crossref’s system on a monthly basis whereas today, with the new Content Intake System up and running, they are able to pull full-text content links from deposited metadata on a daily basis. This means that if you are a Similarity Check user currently depositing full-text URLs with Crossref, your content is being indexed by Turnitin faster than ever before.

There are plenty of other benefits this new method provides. This is why we have agreed with Turnitin that from 1st January 2017 onwards, indexing via full-text URLs will be the only method supported for Similarity Check.

Not convinced? Let me share my top four reasons for advocating Turnitin’s exclusive use of the full-text URL indexing method for Similarity Check:

1. Reduced traffic to publisher servers. Indexing via full-text URLs means that the crawl is targeted specifically to the location of the full-text PDF or HTML content, thereby reducing the amount of traffic Turnitin puts through publisher’s servers.

2. Lower margin for error and simplified issue recovery. Turnitin will no longer need to make multiple fetches for any content item, meaning there are now fewer steps in the process. This means there will be fewer places for indexing errors to occur and also reduces the reliance on users setting meta tags or span tags correctly in their markup. Furthermore, if problems do arise, using the one method of indexing for all users will mean that Turnitin is able to pinpoint the issue faster and work with members to resolve it quickly.

3. Quicker turnaround on indexing with fewer delays. Turnitin will no longer need to investigate and set up bespoke indexing methods for different Similarity Check users and they will be able to access the location of full-text content from the one place (ie. within the specific resource tag in member’s metadata deposits). More accurate data from only one location will result in a quicker turnaround on indexing, meaning newly published content will be added into the Similarity Check content database sooner for all members to check other new manuscripts against.

4. Daily ingest is better than monthly! Full-text links can be collected daily from Crossref-rather than monthly for other methods-meaning a more regular ingest of content.

The presence of full-text URLs within the metadata is critical to the functioning of Turnitin’s new indexing system. All new Similarly Check participants are now asked to ensure they have these links in place within their deposited metadata before they participate in the service.

Already a user of Similarity Check?

If you’re an existing Similarity Check participant who joined the service before 2016, your content is likely to be currently indexed via different methods, such as following links contained in your page meta tags. If you’re not currently depositing full-text links with Crossref for Similarity Check, you will have received an email from us about this in August. If you’re unsure though, you can check your XML to see if you have included the full-text link in the field or you can send us an email at similaritycheck@crossref.org as we’d be happy to check for you.

Help, don’t leave me behind!

Us? Never! We’re here to help. But we really do need those full-text links… Everything existing Similarity Check publishers need to know about adding full-text links into new or existing metadata can be found on our help site. These URLs should be included as part of all standard metadata deposits going forward and can be easily added into existing files in bulk. So there’s no need to redeposit the full metadata, unless of course you would prefer to do so!

That’s a wrap

Looking back, it really has been a busy year for Similarity Check and it will continue to be so as we persevere in laying the groundwork for a more streamlined, robust and scalable service for 2017 and beyond. Remember, we need Similarity Check users to ensure they have full-text URLs in at least 90% of their journal article metadata by 31st December 2016 in order to continue using Similarity Check from 2017 onwards.

And please keep us updated! With over 1,200 publishers using Similarity Check, we’ll need a little nudge to know when metadata has been updated to include these links. So once updates have been deposited, please email similaritycheck@crossref.org to confirm. And of course, as always, if there are any questions or if some advice would help, we’re just an email away.

Get involved

Find a service

Documentation

About us

2024 April 03

Testing times

2024 March 18

Mending Chesterton's Fence: Open Source Decision-making

2024 March 15

Credential Checking at Crossref

2024 March 13

Subject codes, incomplete and unreliable, have got to go

Blog