5 minute read.
Leaving the house - where preprints go
“Pre-prints” are sometimes neither Pre nor Print (c.f. https://0-doi-org.pugwash.lib.warwick.ac.uk/10.12688/f1000research.11408.1, but they do go on and get published in journals. While researchers may have different motivations for posting a preprint, such as establishing a record of priority or seeking rapid feedback, the primary motivation appears to be timely sharing of results prior to journal publication.
So where in fact do preprints get published?
Although this is a simple question, we have not had an easy way to answer how this varies across disciplines, preprint repositories and journals. Until now. Crossref metadata provides not only an open and easy way to do so, but up-to-date data to get the latest results.
rOpenSci makin’ it sweet & easy
Crossref asks preprint repositories to update their metadata once a preprint has been published by adding the article link into its record via the “is-preprint-of” relation. As the record is processed, we make the link available going both directions, while preserving the provenance of the statement in the metadata output (“asserted-by”: “subject” or “asserted-by”: “object”). This results in bidirectional assertions in the Crossref REST API where search engines, analytics providers, indexes, etc. can get from the preprint to the article (“is-preprint-of”) as well as vice versa (“has-preprint”), making it easier to find, cite, link, assess, and reuse.
Using rOpenSci’s R library for the Crossref REST API (rcrossref), we pulled all articles connected to a previous preprint (https://0-api-crossref-org.pugwash.lib.warwick.ac.uk/works?filter=relation.type:has-preprint&facet=publisher-name:&rows=0) and then aggregated them based on journal via their ISSNs (https://0-api-crossref-org.pugwash.lib.warwick.ac.uk/works?filter=relation.type:has-preprint&facet=issn:), tallying the results in a tidy table with the journal name (ex: PLOS Biology (https://0-api-crossref-org.pugwash.lib.warwick.ac.uk/journals/2167-8359)).
The big reveal
So without further delay, let’s look at the results of the 20 journals with the highest number of preprints associated with its articles (data from August 21, 2018):
|Springer Nature||Scientific Reports||394|
|Proceedings of the National Academy of Sciences||PNAS||205|
|PLOS||PLOS Computational Biology||196|
|Springer Nature||Nature Communications||187|
|The Genetics Society of America||Genetics||168|
|Oxford University Press||Nucleic Acids Research||148|
|Oxford University Press||Bioinformatics||138|
|The Genetics Society of America||Genetics||120|
|The Genetics Society of America||G3: Genes, Genomes, Genetics||104|
|Cold Spring Harbor Laboratory||Genome Research||104|
|Oxford University Press||Molecular Biology and Evolution||100|
|Springer Nature||BMC Genomics||92|
|MDPI AG||International Journal of Molecular Sciences||86|
|JMIR Publications||Journal of Medical Internet Research||83|
|This list has not been normalized or weighted based on the size of the journal. The following observations are informed speculations, as we can only infer so much from the raw data:|
- Disciplinary practice: This phenomenon where preprints are a part of disciplinary practice accounts for about half of the journals represented on the list. Certain communities such as genetics and computational fields have been early adopters of preprints. As such, we see higher rates of preprint-to-article publication in journals that publish their work.
- Partnerships: Partnerships that facilitate submission from the preprint repository directly to a publisher or peer review service (ex: BioRxiv B2J program) make it easier for researchers to move from preprint-sharing seamlessly to submitting their journal article manuscript.
- Tie-ins: A quarter of the journals on the list are run by publishers with a preprint service, and have been able to tie together both arms of publishing. This removes barriers to journal article submission in the same manner as integrations between repositories and publishers, but does so as a single party.
- Publisher support and treatment: We also see that strong proponents and early partners of preprint repositories tend to have higher counts. Some publishers have been more outspoken in their welcome of preprints, such as PNAS. Sometimes this support also comes in the form of special treatment. In the process of crafting editorial policy on publishing results previously posted in a preprint, some journals have carved out particular affordances in their publication workflow and content delivery streams that may contribute to the higher counts of articles. For example, Nature Research displays the preprints of submitted articles under consideration: https://0-nature--research--under--consideration-nature-com.pugwash.lib.warwick.ac.uk/.
- Mega-journals: Mega-journals such as Scientific Reports and PLOS ONE have not discouraged preprints. As such, and due to the size of their publication output, they have easily found a place among the higher counts on the list.
Taking a closer look
One major consideration in these results, concerns what’s missing in the data. These fall into two camps: incomplete member data, and incomplete membership coverage.
We have been working with our members to deposit preprints using the proper content type, and to provide links to published articles in their metadata. However, not all have yet done so (ex: SSRN), leading to holes in our research nexus graph, which subsequently detracts from the completeness of the data.
We celebrate the preprint repositories who are required to update their metadata when an article is published from a preprint, thereby populating the map with critical bridges between preprints and articles. Crossref participation benefits not only the content owner, but the membership at large and all the systems across the research ecosystem powered by Crossref metadata.
Lastly, this data is dependent on the coverage of preprint repositories who register content with us. We are thrilled that Center for Open Science, our newest preprints addition who represents 21 community repositories, has recently filled in swaths of the map. But there remain dead zones in the research graph from repositories who are not Crossref members (ex: ArXiv). Their disciplines, as a result, are under represented in these results.
Everyone dive in!
As to the question of “where do preprints get published?”, anyone in fact can answer this question based on the metadata Crossref collects and provides to the community as an open infrastructure provider. We encourage the community to explore and analyze the data further with other available datasets to glean more insights on how scholarly communications is changing with the increasing growth of preprints. For example, the effective results across all journals represented can be weighted based on the number of articles published by each journal.
Crossref data is open for all to examine and reuse through our REST API. Please dive in and share your findings with us!