Leaving the house - where preprints go

5 minute read.

Leaving the house - where preprints go

Jennifer Lin, Karthik Ram – 2018 August 21

In PreprintsMetadataContent RegistrationAPIResearch Nexus

“Pre-prints” are sometimes neither Pre nor Print (c.f. https://0-doi-org.pugwash.lib.warwick.ac.uk/10.12688/f1000research.11408.1, but they do go on and get published in journals. While researchers may have different motivations for posting a preprint, such as establishing a record of priority or seeking rapid feedback, the primary motivation appears to be timely sharing of results prior to journal publication.

So where in fact do preprints get published?

Although this is a simple question, we have not had an easy way to answer how this varies across disciplines, preprint repositories and journals. Until now. Crossref metadata provides not only an open and easy way to do so, but up-to-date data to get the latest results.

rOpenSci makin’ it sweet & easy

Crossref asks preprint repositories to update their metadata once a preprint has been published by adding the article link into its record via the “is-preprint-of” relation. As the record is processed, we make the link available going both directions, while preserving the provenance of the statement in the metadata output (“asserted-by”: “subject” or “asserted-by”: “object”). This results in bidirectional assertions in the Crossref REST API where search engines, analytics providers, indexes, etc. can get from the preprint to the article (“is-preprint-of”) as well as vice versa (“has-preprint”), making it easier to find, cite, link, assess, and reuse.

Using rOpenSci’s R library for the Crossref REST API (rcrossref), we pulled all articles connected to a previous preprint (https://0-api-crossref-org.pugwash.lib.warwick.ac.uk/works?filter=relation.type:has-preprint&facet=publisher-name:&rows=0) and then aggregated them based on journal via their ISSNs (https://0-api-crossref-org.pugwash.lib.warwick.ac.uk/works?filter=relation.type:has-preprint&facet=issn:), tallying the results in a tidy table with the journal name (ex: PLOS Biology (https://0-api-crossref-org.pugwash.lib.warwick.ac.uk/journals/2167-8359)).

The big reveal

So without further delay, let’s look at the results of the 20 journals with the highest number of preprints associated with its articles (data from August 21, 2018):

Publisher	Journal	Count
PeerJ	PeerJ	1184
Springer Nature	Scientific Reports	394
eLife	eLife	375
PLOS	PLOS ONE	338
Proceedings of the National Academy of Sciences	PNAS	205
PLOS	PLOS Computational Biology	196
Springer Nature	Nature Communications	187
PLOS	PLOS Genetics	169
The Genetics Society of America	Genetics	168
Oxford University Press	Nucleic Acids Research	148
Oxford University Press	Bioinformatics	138
The Genetics Society of America	Genetics	120
The Genetics Society of America	G3: Genes, Genomes, Genetics	104
Cold Spring Harbor Laboratory	Genome Research	104
Oxford University Press	Molecular Biology and Evolution	100
MDPI AG	Energies	98
MDPI AG	Sensors	96
Springer Nature	BMC Genomics	92
MDPI AG	International Journal of Molecular Sciences	86
JMIR Publications	Journal of Medical Internet Research	83

This list has not been normalized or weighted based on the size of the journal. The following observations are informed speculations, as we can only infer so much from the raw data:

Disciplinary practice: This phenomenon where preprints are a part of disciplinary practice accounts for about half of the journals represented on the list. Certain communities such as genetics and computational fields have been early adopters of preprints. As such, we see higher rates of preprint-to-article publication in journals that publish their work.
Partnerships: Partnerships that facilitate submission from the preprint repository directly to a publisher or peer review service (ex: BioRxiv B2J program) make it easier for researchers to move from preprint-sharing seamlessly to submitting their journal article manuscript.
Tie-ins: A quarter of the journals on the list are run by publishers with a preprint service, and have been able to tie together both arms of publishing. This removes barriers to journal article submission in the same manner as integrations between repositories and publishers, but does so as a single party.
Publisher support and treatment: We also see that strong proponents and early partners of preprint repositories tend to have higher counts. Some publishers have been more outspoken in their welcome of preprints, such as PNAS. Sometimes this support also comes in the form of special treatment. In the process of crafting editorial policy on publishing results previously posted in a preprint, some journals have carved out particular affordances in their publication workflow and content delivery streams that may contribute to the higher counts of articles. For example, Nature Research displays the preprints of submitted articles under consideration: https://0-nature--research--under--consideration-nature-com.pugwash.lib.warwick.ac.uk/.
Mega-journals: Mega-journals such as Scientific Reports and PLOS ONE have not discouraged preprints. As such, and due to the size of their publication output, they have easily found a place among the higher counts on the list.

Taking a closer look

One major consideration in these results, concerns what’s missing in the data. These fall into two camps: incomplete member data, and incomplete membership coverage.

We have been working with our members to deposit preprints using the proper record type, and to provide links to published articles in their metadata. However, not all have yet done so (ex: SSRN), leading to holes in our research nexus graph, which subsequently detracts from the completeness of the data.

We celebrate the preprint repositories who are required to update their metadata when an article is published from a preprint, thereby populating the map with critical bridges between preprints and articles. Crossref participation benefits not only the content owner, but the membership at large and all the systems across the research ecosystem powered by Crossref metadata.

Lastly, this data is dependent on the coverage of preprint repositories who register content with us. We are thrilled that Center for Open Science, our newest preprints addition who represents 21 community repositories, has recently filled in swaths of the map. But there remain dead zones in the research graph from repositories who are not Crossref members (ex: ArXiv). Their disciplines, as a result, are under represented in these results.

Everyone dive in!

As to the question of “where do preprints get published?”, anyone in fact can answer this question based on the metadata Crossref collects and provides to the community as an open infrastructure provider. We encourage the community to explore and analyze the data further with other available datasets to glean more insights on how scholarly communications is changing with the increasing growth of preprints. For example, the effective results across all journals represented can be weighted based on the number of articles published by each journal.

Crossref data is open for all to examine and reuse through our REST API. Please dive in and share your findings with us!

Get involved

Find a service

Documentation

About us

2024 April 03

Testing times

2024 March 18

Mending Chesterton's Fence: Open Source Decision-making

2024 March 15

Credential Checking at Crossref

2024 March 13

Subject codes, incomplete and unreliable, have got to go

Blog