Suppdata

Suppdata: Downloading Supplementary Data from Published Manuscripts - Published in JOSS (2018)

https://github.com/ropensci/suppdata

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 9 DOI reference(s) in README and JOSS metadata
✓
Academic publication links
Links to: biorxiv.org, wiley.com, plos.org, mdpi.com, joss.theoj.org
✓
Committers with academic emails
2 of 7 committers (28.6%) from academic institutions
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Keywords

peer-reviewed r r-package rstats

Keywords from Contributors

reproducibility

Scientific Fields

Earth and Environmental Sciences Physical Sciences - 40% confidence

Last synced: 6 months ago · JSON representation

Repository

Grabbing SUPPlementary DATA in R

Basic Info

Host: GitHub
Owner: ropensci
License: other
Language: R
Default Branch: master
Homepage: https://docs.ropensci.org/suppdata
Size: 291 KB

Statistics

Stars: 35
Watchers: 4
Forks: 7
Open Issues: 10
Releases: 2

Topics

peer-reviewed r r-package rstats

Created over 10 years ago · Last pushed over 2 years ago

Metadata Files

Readme Changelog Contributing License Code of conduct

Loading SUPPlementary DATA into R

William D. Pearse, Daniel Nuest, and Scott Chamberlain

Overview

The aim of this package is to aid downloading data from published papers. To download the supplementary data from a PLoS paper, for example, you would simply type:

{R} library(suppdata) suppdata("10.1371/journal.pone.0127900", 1)

...and this would download the first supplementary information (SI) from the paper.

This sort of thing is very useful if you're doing meta-analyses, or just want to make sure that you know where all your data came from and want a completely reproducible "audit trail" of what you've done. It uses rcrossref to lookup which journal the article is in.

How to install and load the package

The version on CRAN is the most stable version. You can install and load it like this:

{R} install.packages("suppdata") library(suppdata)

If you want to load the development version, which probably contains more features but is not always guaranteed to work, load the master branch from this repository like this:

{R} library(devtools) install_github("ropensci/suppdata") library(suppdata)

This package depends on the packages httr, xml2, jsonlite, and rcrossref.

Supported publishers and repositories

bioRxiv (biorxiv)
Copernicus Publications (copernicus)
DRYAD (dryad)
Ecological Society of Ameria - Ecological Archives (esa_archives and esa_data_archives)
Europe PMC (epmc, multiple publishers from life-sciences upported including BMJ Journals, eLife, F1000Research, Wellcome Open Research, Gates Open Research)
figshare (figshare)
Journal of Statistical Software (jstatsoft)
MDPI (mdpi)
PeerJ (peerj)
PLOS | Public Library of Science (plos)
Proceedings of the royal society Biology (RSBP) (proceedings)
Science (science)
Wiley (wiley)

See a list of potential sources at #2 - requests welcome!

Contributing

For more details on how to contribute to the package, check out the guide in CONTRIBUTING.MD.

A more detailed set of motivations for `suppdata`

suppdata is an R package to provide easy, reproducible access to supplemental materials within R. Thus suppdata facilitates open, reproducible research workflows: scientists re-analyzing published datasets can work with them as easily as if they were stored on their own computer, and others can track their analysis workflow painlessly.

For example, imagine you were conducting an analysis of the evolution of body mass in mammals. Without suppdata, such an analysis would require manually downloading body mass and phylogenetic data from published manuscripts. This is time-consuming, difficult (if not impossible) to make truly reproducible without re-distributing the data, and hard to follow. With suppdata, such an analysis is straightforward, reproducible, and the sources of the data are clear because their DOIs are embedded within the code:

```{R}

Load phylogenetics packages

library(ape) library(caper) library(phytools)

Load suppdata

library(suppdata)

Load two published datasets

tree <- read.nexus(suppdata("10.1111/j.1461-0248.2009.01307.x", 1))[[1]] traits <- read.delim(suppdata("E090-184", "PanTHERIA1-0WR05Aug2008.txt", "esaarchives"))

Merge datasets

traits <- with(traits, data.frame(body.mass = log10(X5.1AdultBodyMassg), species=gsub(" ","",MSW05Binomial))) c.data <- comparative.data(tree, traits, species)

Calculate phylogenetic signal

phylosig(c.data$phy, c.data$data$body.mass) ```

A guided walk through `suppdata`

The aim of suppdata is to make it as easy as possible for you to write reproducible analysis scripts that make use of published data. So let's start with that first, simplest case: how to make use of published data in an analysis.

Learning by example

Below is an example of an analysis run using suppdata. Read through it first, and then we'll go through what all the parts mean.

```{R}

Load phylogenetics packages

library(ape) library(caper) library(phytools)

LOAD TWO PUBLISHED DATASETS

USING SUPPDATA

library(suppdata) tree <- read.nexus(suppdata("10.1111/j.1461-0248.2009.01307.x", 1))[[1]] traits <- read.delim(suppdata("E090-184", "PanTHERIA1-0WR05Aug2008.txt", "esaarchives"))

Merge datasets

traits <- with(traits, data.frame(body.mass = log10(X5.1AdultBodyMassg), species=gsub(" ","",MSW05Binomial))) c.data <- comparative.data(tree, traits, species)

Calculate phylogenetic signal

phylosig(c.data$phy, c.data$data$body.mass) ```

This short script loads some R packages focused on modelling the evolution of species' traits, then it gets to the "good stuff": using suppdata. First, we load the suppdata package using library(suppdata). The next line uses a function called read.nexus, which loads something called a phylogeny (you might be familiar with this if you're a biologist). Normally, this function would take the location of a file on our hard-drive as a single argument, but now we're giving it the output from a call to the suppdata function.

suppdata is going to the website of the article whose DOI is 10.1111/j.1461-0248.2009.01307.x (it's this paper by Fritz et al.), and then taking the first (1) supplement from that article. It saves that to a temporary location on your hard-drive, and then gives that location to read.nexus. This works with any function that expects a file at a location on your hard-drive. What particularly neat is that suppdata remembers that it has downloaded that file already (see below for more details), such that you only have to download something once per R session.

The second call to suppdata, which makes use of read.delim, shows two of the potential complexities of suppdata. First of all, because some journal publishers store their supplementary materials using numbers and others using specific file-names, suppdata takes either a number (like in the first example), or a name (like in the second example) depending on the journal publisher you're taking data from. If you look in the help file for suppdata, there is a table outlining those options. Sorry, you've just got to read up on it :-( Secondly, if you're an ecologist you might be familiar with the Ecological Society of America's data archives section. While they've moved over to a new way of storing data more recently, if you're hoping to load an older dataset from that journal you need to give the ESA data archive reference and specify that you're downloading from ESA (as in this example). If you're not an ecologist, don't worry about it, as this doesn't apply to you :D

That's it! You now know all you need to in order to use suppdata! The rest of the lines of code merge these datasets together, and then calculate something called phylogenetic signal in these datasets. If you're an evolutionary biologist, those lines might be interesting to you. If you're not, then don't worry about them.

Caching and saving to a specific directory

Sometimes, you will want to use suppdata to build a store of files on your hard-drive. If so, you should know that suppdata takes three optional arguments: cache, dir, and save.name. If you specify cache=FALSE, it will turn off suppdata's caching of files: this will force it to download your data again. This is mostly useful if you somehow make suppdata foul itself up (maybe you hit control-c or stop during a download) and so suppdata has only half-downloaded a file, and so thinks it's cached something when it hasn't. If you get an error when using suppdata, this is a good thing to try setting.

dir specifies a directory where suppdata should store files, and save.name specifies the name that the file should be saved under when saved. This is useful if you want to make a folder on your computer that contains certain files you use a lot: suppdata will cache from this folder if you tell it to, and so you can build up a reproducible selection of data to use inbetween R sessions.

Owner

Name: rOpenSci
Login: ropensci
Kind: organization
Email: info@ropensci.org
Location: Berkeley, CA

Website: https://ropensci.org/
Twitter: rOpenSci
Repositories: 307
Profile: https://github.com/ropensci

JOSS Publication

Suppdata: Downloading Supplementary Data from Published Manuscripts

Published

May 03, 2018

DOI

10.21105/joss.00721

Volume 3, Issue 25, Page 721

Authors

William D. Pearse

Department of Biology & Ecology Center, Utah State University, Logan, Utah, USA

Scott A. Chamberlain

rOpenSci

Editor

Arfon Smith

GitHub Events

Total

Last Year

Committers

Last synced: 7 months ago

All Time

Total Commits: 142
Total Committers: 7
Avg Commits per committer: 20.286
Development Distribution Score (DDS): 0.43

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Will Pearse	w**e@g**m	81
nuest	d**t@u**e	51
Noam Ross	n**s@g**m	5
Will Pearse	w**e@m**a	2
Katrin Leinweber	9****r	1
ChrisMuir	c**A@g**m	1
rOpenSci Bot	m**t@g**m	1

Committer Domains (Top 20 + Academic)

mcgill.ca: 1 uni-muenster.de: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 37
Total pull requests: 19
Average time to close issues: 6 months
Average time to close pull requests: 11 days
Total issue authors: 6
Total pull request authors: 4
Average comments per issue: 2.41
Average comments per pull request: 1.26
Merged pull requests: 17
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

rossmounce (12)
willpearse (10)
nuest (10)
sckott (2)
AlbanSagouis (2)
CidaJiang (1)

Pull Request Authors

nuest (9)
willpearse (8)
katrinleinweber (1)
ChrisMuir (1)

Top Labels

Issue Labels

enhancement (5) help wanted (1)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 229 last-month

Total dependent packages: 1
Total dependent repositories: 2
Total versions: 8
Total maintainers: 1

cran.r-project.org: suppdata

Downloading Supplementary Data from Published Manuscripts

Homepage: https://docs.ropensci.org/suppdata/
Documentation: http://cran.r-project.org/web/packages/suppdata/suppdata.pdf
License: MIT + file LICENSE
Status: removed
Latest release: 1.1-9
published over 2 years ago

Versions: 8
Dependent Packages: 1
Dependent Repositories: 2
Downloads: 229 Last month

Rankings

Forks count: 8.7%

Stargazers count: 9.0%

Average: 17.9%

Dependent packages count: 18.1%

Dependent repos count: 19.2%

Downloads: 34.6%

Maintainers (1)

will.pearse@gmail.com

Last synced: 6 months ago

Dependencies

DESCRIPTION cran

httr >= 1.0.0 imports
jsonlite >= 1.5 imports
rcrossref >= 0.8.0 imports
xml2 >= 1.2.0 imports
covr >= 3.0.1 suggests
knitr >= 1.6 suggests
testthat >= 2.0.0 suggests

Suppdata

Science Score: 95.0%

Keywords

Keywords from Contributors

Scientific Fields

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Loading SUPPlementary DATA into R

Overview

How to install and load the package

Supported publishers and repositories

Contributing

A more detailed set of motivations for suppdata

Load phylogenetics packages

Load suppdata

Load two published datasets

Merge datasets

Calculate phylogenetic signal

A guided walk through suppdata

Learning by example

Load phylogenetics packages

LOAD TWO PUBLISHED DATASETS

USING SUPPDATA

Merge datasets

Calculate phylogenetic signal

Caching and saving to a specific directory

Owner

JOSS Publication

Suppdata: Downloading Supplementary Data from Published Manuscripts

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: suppdata

Rankings

Maintainers (1)

Dependencies

A more detailed set of motivations for `suppdata`

A guided walk through `suppdata`