restez

restez: Create and Query a Local Copy of GenBank in R - Published in JOSS (2018)

https://github.com/ropensci/restez

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 11 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: ncbi.nlm.nih.gov, joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

dna entrez genbank r r-package rstats sequence

Keywords from Contributors

reproducibility make r-targetopia targets codemeta json-ld
Last synced: 4 months ago · JSON representation

Repository

:sleeping: :open_file_folder: Create and Query a Local Copy of GenBank in R

Basic Info
Statistics
  • Stars: 27
  • Watchers: 7
  • Forks: 5
  • Open Issues: 9
  • Releases: 9
Topics
dna entrez genbank r r-package rstats sequence
Created over 7 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog Contributing License Codemeta

README.Rmd

---
output: github_document
---





```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)
```

# Locally query GenBank 

[![R-CMD-check](https://github.com/ropensci/restez/workflows/R-CMD-check/badge.svg)](https://github.com/ropensci/restez/actions)
[![Coverage Status](https://coveralls.io/repos/github/ropensci/restez/badge.svg?branch=master)](https://coveralls.io/github/ropensci/restez?branch=master)
[![ROpenSci status](https://badges.ropensci.org/232_status.svg)](https://github.com/ropensci/software-review/issues/232)
[![CRAN downloads](http://cranlogs.r-pkg.org/badges/grand-total/restez)](https://CRAN.R-project.org/package=restez)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.6806029.svg)](https://doi.org/10.5281/zenodo.6806029)
[![status](https://joss.theoj.org/papers/10.21105/joss.01102/status.svg)](https://joss.theoj.org/papers/10.21105/joss.01102)

> NOTE: Starting with v2.0.0, the database backend changed from [MonetDBLite](https://github.com/MonetDB/MonetDBLite-R) to [duckdb](https://github.com/duckdb/duckdb). Because of this change, restez v2.0.0 or higher **is not compatible with databases built with previous versions of restez**.

Download parts of [NCBI's GenBank](https://www.ncbi.nlm.nih.gov/nuccore) to a local folder and create a simple SQL-like database. Use 'get' tools to query the database by accession IDs. [rentrez](https://github.com/ropensci/rentrez) wrappers are available, so that if sequences are not available locally they can be searched for online through [Entrez](https://www.ncbi.nlm.nih.gov/books/NBK25500/).

See the [detailed tutorials](https://docs.ropensci.org/restez/articles/restez.html) for more information.

## Introduction

*Vous entrez, vous rentrez et, maintenant, vous .... restez!*

Downloading sequences and sequence information from GenBank and related NCBI taxonomic databases is often performed via the NCBI API, Entrez. Entrez, however, has a limit on the number of requests and downloading large amounts of sequence data in this way can be inefficient. For programmatic situations where multiple Entrez calls are made, downloading may take days, weeks or even months.

This package aims to make sequence retrieval more efficient by allowing a user to download large sections of the GenBank database to their local machine and query this local database either through package specific functions or Entrez wrappers. This process is more efficient as GenBank downloads are made via NCBI's FTP using compressed sequence files. With a good internet connection and a middle-of-the-road computer, a database comprising 20 GB of sequence information can be generated in less than 10 minutes.



## Installation

Install from CRAN:

```{r cran-installation, include=TRUE, echo=TRUE, eval=FALSE}
install.packages("restez")
```

Or install the development version from r-universe:

```{r r-univ-installation, include=TRUE, echo=TRUE, eval=FALSE}
install.packages("restez", repos = "https://ropensci.r-universe.dev")
```

Or install the development version from GitHub (requires installing the `remotes` package first):

```{r gh-installation, include=TRUE, echo=TRUE, eval=FALSE}
# install.packages("remotes")
remotes::install_github("ropensci/restez")
```

## Quick Examples

> For more detailed information on the package's functions and detailed guides on downloading, constructing and querying a database, see the [detailed  tutorials](https://docs.ropensci.org/restez/articles/restez.html).

### Setup

```{r presetup, include=FALSE}
rstz_pth <- tempdir()
restez::restez_path_set(filepath = rstz_pth)
restez::db_delete(everything = TRUE)
```
```{r restez-setup, echo=TRUE, eval=TRUE, results='hide'}
# Warning: running these examples may take a few minutes
library(restez)
# choose a location to store GenBank files
restez_path_set(rstz_pth)
```
``` {r reassigndownload, include=FALSE, eval=TRUE}
db_download <- function() {
  restez::db_download(preselection = '20')
}
```
```{r gb-download, echo=TRUE, eval=TRUE, results='hide'}
# Run the download function
db_download()
# after download, create the local database
db_create()
```

### Query

```{r query, echo=TRUE, eval=TRUE}
# for reproducibility
set.seed(12345)
# get a random accession ID from the database
id <- sample(list_db_ids(), 1)
# you can extract:
# sequences
seq <- gb_sequence_get(id)[[1]]
str(seq)
# definitions
def <- gb_definition_get(id)[[1]]
print(def)
# organisms
org <- gb_organism_get(id)[[1]]
print(org)
# or whole records
rec <- gb_record_get(id)[[1]]
cat(rec)
```

### Entrez wrappers

```{r entrez, echo=TRUE, eval=TRUE}
# use the entrez_* wrappers to access GB data
res <- entrez_fetch(db = 'nucleotide', id = id, rettype = 'fasta')
cat(res)
# if the id is not in the local database
# these wrappers will search online via the rentrez package
res <- entrez_fetch(db = 'nucleotide', id = c('S71333.1', id),
                    rettype = 'fasta')
cat(res)
```

## Contributing

Want to contribute? Check the [contributing page](https://docs.ropensci.org/restez/CONTRIBUTING.html).

## Licence

MIT

## Citation

Bennett et al. (2018). restez: Create and Query a Local Copy of GenBank in R.
*Journal of Open Source Software*, 3(31), 1102. https://doi.org/10.21105/joss.01102

## References

Benson, D. A., Karsch-Mizrachi, I., Clark, K., Lipman, D. J., Ostell, J., &
Sayers, E. W. (2012). GenBank. *Nucleic Acids Research*, 40(Database issue),
D48–D53. DOI 10.1093/nar/gkr1202

Winter DJ. (2017) rentrez: An R package for the NCBI eUtils API.
*PeerJ Preprints* 5:e3179v2 https://doi.org/10.7287/peerj.preprints.3179v2

## Maintainer

[Joel Nitta](https://github.com/joelnitta)

This package previously developed and maintained by Dom Bennett

-----

[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)

Owner

  • Name: rOpenSci
  • Login: ropensci
  • Kind: organization
  • Email: info@ropensci.org
  • Location: Berkeley, CA

JOSS Publication

restez: Create and Query a Local Copy of GenBank in R
Published
November 27, 2018
Volume 3, Issue 31, Page 1102
Authors
Dominic J. Bennett ORCID
Gothenburg Global Biodiversity Centre, Box 461, SE-405 30 Gothenburg, Sweden, Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE-405 30 Gothenburg, Sweden
Hannes Hettling
Naturalis Biodiversity Center, P.O. Box 9517, 2300 RA Leiden, The Netherlands
Daniele Silvestro ORCID
Gothenburg Global Biodiversity Centre, Box 461, SE-405 30 Gothenburg, Sweden, Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE-405 30 Gothenburg, Sweden
Rutger Vos ORCID
Naturalis Biodiversity Center, P.O. Box 9517, 2300 RA Leiden, The Netherlands
Alexandre Antonelli ORCID
Gothenburg Global Biodiversity Centre, Box 461, SE-405 30 Gothenburg, Sweden, Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE-405 30 Gothenburg, Sweden, Gothenburg Botanical Garden, SE 41319 Gothenburg, Sweden
Editor
Karthik Ram ORCID
Tags
GenBank nucleotides DNA sequence NCBI rstats

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "identifier": "restez",
  "description": "Download large sections of 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> and generate a local SQL-based database. A user can then query this database using 'restez' functions or through 'rentrez' <https://CRAN.R-project.org/package=rentrez> wrappers.",
  "name": "restez: Create and Query a Local Copy of 'GenBank' in R",
  "relatedLink": [
    "https://docs.ropensci.org/restez/",
    "https://CRAN.R-project.org/package=restez"
  ],
  "codeRepository": "https://github.com/ropensci/restez",
  "issueTracker": "https://github.com/ropensci/restez/issues",
  "license": "https://spdx.org/licenses/MIT",
  "version": "2.1.3.9000",
  "programmingLanguage": {
    "@type": "ComputerLanguage",
    "name": "R",
    "url": "https://r-project.org"
  },
  "runtimePlatform": "R version 4.2.1 (2022-06-23)",
  "provider": {
    "@id": "https://cran.r-project.org",
    "@type": "Organization",
    "name": "Comprehensive R Archive Network (CRAN)",
    "url": "https://cran.r-project.org"
  },
  "author": [
    {
      "@type": "Person",
      "givenName": "Joel H.",
      "familyName": "Nitta",
      "email": "joelnitta@gmail.com",
      "@id": "https://orcid.org/0000-0003-4719-7472"
    },
    {
      "@type": "Person",
      "givenName": "Dom",
      "familyName": "Bennett",
      "email": "dominic.john.bennett@gmail.com",
      "@id": "https://orcid.org/0000-0003-2722-1359"
    }
  ],
  "maintainer": [
    {
      "@type": "Person",
      "givenName": "Joel H.",
      "familyName": "Nitta",
      "email": "joelnitta@gmail.com",
      "@id": "https://orcid.org/0000-0003-4719-7472"
    }
  ],
  "softwareSuggestions": [
    {
      "@type": "SoftwareApplication",
      "identifier": "sessioninfo",
      "name": "sessioninfo",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=sessioninfo"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "testthat",
      "name": "testthat",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=testthat"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "knitr",
      "name": "knitr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=knitr"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "R.utils",
      "name": "R.utils",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=R.utils"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "rmarkdown",
      "name": "rmarkdown",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rmarkdown"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "mockery",
      "name": "mockery",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=mockery"
    }
  ],
  "softwareRequirements": {
    "1": {
      "@type": "SoftwareApplication",
      "identifier": "R",
      "name": "R",
      "version": ">= 3.3.0"
    },
    "2": {
      "@type": "SoftwareApplication",
      "identifier": "utils",
      "name": "utils"
    },
    "3": {
      "@type": "SoftwareApplication",
      "identifier": "rentrez",
      "name": "rentrez",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rentrez"
    },
    "4": {
      "@type": "SoftwareApplication",
      "identifier": "DBI",
      "name": "DBI",
      "version": ">= 1.0.0",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=DBI"
    },
    "5": {
      "@type": "SoftwareApplication",
      "identifier": "curl",
      "name": "curl",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=curl"
    },
    "6": {
      "@type": "SoftwareApplication",
      "identifier": "cli",
      "name": "cli",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=cli"
    },
    "7": {
      "@type": "SoftwareApplication",
      "identifier": "crayon",
      "name": "crayon",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=crayon"
    },
    "8": {
      "@type": "SoftwareApplication",
      "identifier": "stringi",
      "name": "stringi",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=stringi"
    },
    "9": {
      "@type": "SoftwareApplication",
      "identifier": "duckdb",
      "name": "duckdb",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=duckdb"
    },
    "10": {
      "@type": "SoftwareApplication",
      "identifier": "fs",
      "name": "fs",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=fs"
    },
    "11": {
      "@type": "SoftwareApplication",
      "identifier": "assertthat",
      "name": "assertthat",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=assertthat"
    },
    "12": {
      "@type": "SoftwareApplication",
      "identifier": "ape",
      "name": "ape",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=ape"
    },
    "SystemRequirements": null
  },
  "fileSize": "1676.932KB",
  "citation": [
    {
      "@type": "ScholarlyArticle",
      "datePublished": "2018",
      "author": [
        {
          "@type": "Organization",
          "name": "Bennett"
        },
        {
          "@type": "Organization",
          "name": "D.J."
        },
        {
          "@type": "Organization",
          "name": "Hettling"
        },
        {
          "@type": "Organization",
          "name": "H."
        },
        {
          "@type": "Organization",
          "name": "Silvestro"
        },
        {
          "@type": "Organization",
          "name": "D."
        },
        {
          "@type": "Organization",
          "name": "Vos"
        },
        {
          "@type": "Organization",
          "name": "R."
        },
        {
          "@type": "Organization",
          "name": "Antonelli"
        },
        {
          "@type": "Organization",
          "name": "A."
        }
      ],
      "name": "restez: Create and Query a Local Copy of GenBank in R",
      "identifier": "10.21105/joss.01102",
      "pagination": "1102",
      "@id": "https://doi.org/10.21105/joss.01102",
      "sameAs": "https://doi.org/10.21105/joss.01102",
      "isPartOf": {
        "@type": "PublicationIssue",
        "issueNumber": "31",
        "datePublished": "2018",
        "isPartOf": {
          "@type": [
            "PublicationVolume",
            "Periodical"
          ],
          "volumeNumber": "3",
          "name": "The Journal of Open Source Software"
        }
      }
    }
  ],
  "releaseNotes": "https://github.com/ropensci/restez/blob/master/NEWS.md",
  "readme": "https://github.com/ropensci/restez/blob/main/README.md",
  "contIntegration": [
    "https://github.com/ropensci/restez/actions",
    "https://coveralls.io/github/ropensci/restez?branch=master"
  ],
  "review": {
    "@type": "Review",
    "url": "https://github.com/ropensci/software-review/issues/232",
    "provider": "https://ropensci.org"
  },
  "keywords": [
    "genbank",
    "dna",
    "sequence",
    "entrez",
    "r",
    "r-package",
    "rstats"
  ]
}

GitHub Events

Total
  • Issues event: 7
  • Watch event: 2
  • Delete event: 2
  • Issue comment event: 3
  • Push event: 7
  • Pull request event: 3
  • Create event: 2
Last Year
  • Issues event: 7
  • Watch event: 2
  • Delete event: 2
  • Issue comment event: 3
  • Push event: 7
  • Pull request event: 3
  • Create event: 2

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 348
  • Total Committers: 6
  • Avg Commits per committer: 58.0
  • Development Distribution Score (DDS): 0.368
Past Year
  • Commits: 12
  • Committers: 1
  • Avg Commits per committer: 12.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Dominic Bennett d****t@g****m 220
Joel Nitta j****a@g****m 112
Jeroen Ooms j****s@g****m 11
Rutger Vos r****o@g****m 2
Maëlle Salmon m****n@y****e 2
Ben Tupper b****r@b****g 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 47
  • Total pull requests: 23
  • Average time to close issues: 3 months
  • Average time to close pull requests: 2 days
  • Total issue authors: 15
  • Total pull request authors: 3
  • Average comments per issue: 1.49
  • Average comments per pull request: 0.13
  • Merged pull requests: 21
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 4
  • Average time to close issues: 10 days
  • Average time to close pull requests: 1 day
  • Issue authors: 3
  • Pull request authors: 1
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • joelnitta (15)
  • DomBennett (13)
  • maelle (4)
  • krlmlr (2)
  • bioxenia (2)
  • btupper (2)
  • cbird808 (1)
  • hadley (1)
  • sckott (1)
  • terrimporter (1)
  • JanPerret (1)
  • chow42 (1)
  • Maxim-Karpov (1)
  • RinLinux (1)
  • jeroen (1)
Pull Request Authors
  • joelnitta (21)
  • btupper (2)
  • maelle (1)
Top Labels
Issue Labels
enhancement (16) review (5) bug (2)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 280 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 5
  • Total maintainers: 1
cran.r-project.org: restez

Create and Query a Local Copy of 'GenBank' in R

  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 280 Last month
Rankings
Stargazers count: 10.6%
Forks count: 12.2%
Average: 22.4%
Dependent repos count: 23.9%
Dependent packages count: 28.7%
Downloads: 36.6%
Maintainers (1)
Last synced: 4 months ago