Blog

Evolving our support for text-and-data mining

Bryan Vickery

Bryan Vickery – 2020 August 21

In Text and Data Mining

Many researchers want to carry out analysis and extraction of information from large sets of data, such as journal articles and other scholarly content. Methods such as screen-scraping are error-prone, place too much strain on content sites and may be unrepeatable or break if site layouts change. Providing researchers with automated access to the full-text content via DOIs and Crossref metadata reduces these problems, allowing for easy deduplication and reproducibility. Supporting text and data mining echoes our mission to make research outputs easy to find, cite, link, assess, and reuse.

OTMI - An Update

We’ve just posted an update about OTMI (the Open Text Mining Interface) on our Web Publishing blog Nascent. This post details the following changes: Contact email - otmi@nature.com Wiki - http://opentextmining.org/ Repository - https://web.archive.org/web/20090706181310/http://0-www-nature-com.pugwash.lib.warwick.ac.uk/otmi/journals.opml The OTMI content repository currently provides two years’ worth of full text across five of our titles: Nature Nature Genetics Nature Reviews Drug Discovery Nature Structural & Molecular Biology The Pharmacogenomics Journal See the wiki for draft technical specs and for a sample script to generate the OTMI files.