Retriever

Retriever: Data Retrieval Tool - Published in JOSS (2017)

https://github.com/weecology/retriever

Science Score: 100.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
    5 of 73 committers (6.8%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

data data-retrieval data-science dataset datasets hacktobefest python

Keywords from Contributors

ecology shiny community-ecology small-mammal-trapping portal pde
Last synced: 4 months ago · JSON representation ·

Repository

Quickly download, clean up, and install public datasets into a database management system

Basic Info
  • Host: GitHub
  • Owner: weecology
  • License: other
  • Language: Python
  • Default Branch: main
  • Homepage: http://data-retriever.org
  • Size: 77.4 MB
Statistics
  • Stars: 318
  • Watchers: 31
  • Forks: 141
  • Open Issues: 54
  • Releases: 12
Topics
data data-retrieval data-science dataset datasets hacktobefest python
Created over 14 years ago · Last pushed 9 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation

README.md

Retriever logo

Python package Build Status (windows) Research software impact codecov.io Documentation Status License Join the chat at https://gitter.im/weecology/retriever DOI JOSS Publication Anaconda-Server Badge Anaconda-Server Badge Version NumFOCUS

Finding data is one thing. Getting it ready for analysis is another. Acquiring, cleaning, standardizing and importing publicly available data is time consuming because many datasets lack machine readable metadata and do not conform to established data structures and formats. The Data Retriever automates the first steps in the data analysis pipeline by downloading, cleaning, and standardizing datasets, and importing them into relational databases, flat files, or programming languages. The automation of this process reduces the time for a user to get most large datasets up and running by hours, and in some cases days.

Installing the Current Release

If you have Python installed you can install the current release using either pip:

bash pip install retriever

or conda after adding the conda-forge channel (conda config --add channels conda-forge):

bash conda install retriever

Depending on your system configuration this may require sudo for pip:

bash sudo pip install retriever

Precompiled binary installers are also available for Windows, OS X, and Ubuntu/Debian on the releases page. These do not require a Python installation.

List of Available Datasets

Installing From Source

To install the Data Retriever from source, you'll need Python 3.6.8+ with the following packages installed:

  • xlrd

The following packages are optionally needed to interact with associated database management systems:

  • PyMySQL (for MySQL)
  • sqlite3 (for SQLite)
  • psycopg2-binary (for PostgreSQL), previously psycopg2.
  • pyodbc (for MS Access - this option is only available on Windows)
  • Microsoft Access Driver (ODBC for windows)

To install from source

Either use pip to install directly from GitHub:

shell pip install git+https://git@github.com/weecology/retriever.git

or:

  1. Clone the repository
  2. From the directory containing setup.py, run the following command: pip install .. You may need to include sudo at the beginning of the command depending on your system (i.e., sudo pip install .).

More extensive documentation for those that are interested in developing can be found here

Using the Command Line

After installing, run retriever update to download all of the available dataset scripts. To see the full list of command line options and datasets run retriever --help. The output will look like this:

```shell usage: retriever [-h] [-v] [-q] {download,install,defaults,update,new,newjson,editjson,delete_json,ls,citation,reset,help} ...

positional arguments: {download,install,defaults,update,new,newjson,editjson,deletejson,ls,citation,reset,help} sub-command help download download raw data files for a dataset install download and install dataset defaults displays default options update download updated versions of scripts new create a new sample retriever script newjson CLI to create retriever datapackage.json script editjson CLI to edit retriever datapackage.json script deletejson CLI to remove retriever datapackage.json script ls display a list all available dataset scripts citation view citation reset reset retriever: removes configuration settings, scripts, and cached data help

optional arguments: -h, --help show this help message and exit -v, --version show program's version number and exit -q, --quiet suppress command-line output ```

To install datasets, use retriever install:

```shell usage: retriever install [-h] [--compile] [--debug] {mysql,postgres,sqlite,msaccess,csv,json,xml} ...

positional arguments: {mysql,postgres,sqlite,msaccess,csv,json,xml} engine-specific help mysql MySQL postgres PostgreSQL sqlite SQLite msaccess Microsoft Access csv CSV json JSON xml XML

optional arguments: -h, --help show this help message and exit --compile force re-compile of script before downloading --debug run in debug mode ```

Examples

These examples are using the Iris flower dataset. More examples can be found in the Data Retriever documentation.

Using Install

shell retriever install -h (gives install options)

Using specific database engine, retriever install {Engine}

shell retriever install mysql -h (gives install mysql options) retriever install mysql --user myuser --password ******** --host localhost --port 8888 --database_name testdbase iris install data into an sqlite database named iris.db you would use:

shell retriever install sqlite iris -f iris.db

Using download

shell retriever download -h (gives you help options) retriever download iris retriever download iris --path C:\Users\Documents

Using citation

shell retriever citation (citation of the retriever engine) retriever citation iris (citation for the iris data)

Spatial Dataset Installation

Set up Spatial support

To set up spatial support for Postgres using Postgis please refer to the spatial set-up docs.

```shell retriever install postgres harvard-forest # Vector data retriever install postgres bioclim # Raster data

Install only the data of USGS elevation in the given extent

retriever install postgres usgs-elevation -b -94.98704597353938 39.027001800158615 -94.3599408119917 40.69577051867074

```

Website

For more information see the Data Retriever website.

Acknowledgments

Development of this software was funded by the Gordon and Betty Moore Foundation's Data-Driven Discovery Initiative through Grant GBMF4563 to Ethan White and the National Science Foundation as part of a CAREER award to Ethan White.

Owner

  • Name: Weecology
  • Login: weecology
  • Kind: organization

JOSS Publication

Retriever: Data Retrieval Tool
Published
November 15, 2017
Volume 2, Issue 19, Page 451
Authors
Henry Senyondo ORCID
Department of Wildlife Ecology and Conservation, University of Florida
Benjamin D. Morris ORCID
None
Akash Goel ORCID
Delhi Technological University, Delhi
Andrew Zhang ORCID
The University of Florida
Akshay Narasimha ORCID
Birla Institute of Technology and Science, Pilani
Shivam Negi ORCID
Manipal Institute of Technology, Manipal
David J. Harris ORCID
The University of Florida
Deborah Gertrude Digges ORCID
PES Institute of Technology, Bengaluru
Kapil Kumar ORCID
National Institute of Technology, Delhi
Amritanshu Jain ORCID
Birla Institute of Technology and Science, Pilani
Kunal Pal ORCID
RWTH Aachen University, Aachen, Germany
Kevinkumar Amipara ORCID
Sardar Vallabhbhai National Institute of Technology, Surat
Ethan P. White ORCID
Department of Wildlife Ecology and Conservation, University of Florida, Informatics Institute, University of Florida
Editor
Thomas J. Leeper ORCID
Tags
data retrieval data processing python data data science datasets

Citation (CITATION)

Morris, B.D. and E.P. White. 2013. The EcoData Retriever: improving access to
existing ecological data. PLOS ONE 8:e65848.
http://doi.org/doi:10.1371/journal.pone.0065848

@article{morris2013ecodata,
  title={The EcoData Retriever: Improving Access to Existing Ecological Data},
  author={Morris, Benjamin D and White, Ethan P},
  journal={PLOS One},
  volume={8},
  number={6},
  pages={e65848},
  year={2013},
  publisher={Public Library of Science}
  doi={10.1371/journal.pone.0065848}
}

Papers & Mentions

Total mentions: 3

<i>De novo</i> genome assembly of two tomato ancestors, <i>Solanum pimpinellifolium</i> and <i>Solanum</i> <i> lycopersicum</i> var. <i>cerasiforme</i>, by long-read sequencing
Last synced: 2 months ago
Environmental Enrichment Induces Epigenomic and Genome Organization Changes Relevant for Cognition
Last synced: 2 months ago
Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
Last synced: 2 months ago

GitHub Events

Total
  • Issues event: 1
  • Watch event: 11
  • Issue comment event: 8
  • Push event: 1
  • Pull request event: 5
  • Gollum event: 13
  • Fork event: 8
Last Year
  • Issues event: 1
  • Watch event: 11
  • Issue comment event: 8
  • Push event: 1
  • Pull request event: 5
  • Gollum event: 13
  • Fork event: 8

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 2,030
  • Total Committers: 73
  • Avg Commits per committer: 27.808
  • Development Distribution Score (DDS): 0.634
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Ben Morris b****n@b****m 743
Ethan White e****n@w****g 403
henrykironde h****e@g****m 393
goelakash g****3@g****m 66
zhangcandrew z****w@g****m 43
Apoorva Pandey a****5@g****m 40
Shreyash sharma s****l@y****n 35
Harshit Bansal h****c@g****m 34
ShivamNegi s****9@g****m 28
Elita Baldridge e****e@w****g 26
Aakash Chaudhary a****0@g****m 19
Ansh Dassani a****4@g****m 17
Ashish a****1@g****m 16
Daniel McGlinn d****n@g****m 10
Sumit Saha s****6@g****m 9
akshayah3 a****5@g****m 8
David J. Harris h****1@g****m 7
Katherine Thibault k****t@n****g 7
Pankaj Kumar me@p****e 6
Nageshbansal n****9@g****m 6
ddigges d****s@i****m 5
Kunal Pal m****l@g****m 5
Shawn Taylor s****r@w****g 5
kapil kumar k****3@g****m 5
paul wolf p****f@u****u 5
ashu j****u@g****m 4
sarsees r****e@g****m 4
pranita-s p****a@g****m 4
Kristina Riemer k****r@w****g 4
unknown B****n@.****) 4
and 43 more...

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 30
  • Total pull requests: 84
  • Average time to close issues: 8 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 8
  • Total pull request authors: 17
  • Average comments per issue: 2.63
  • Average comments per pull request: 2.63
  • Merged pull requests: 48
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 0
  • Pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 3
  • Average comments per issue: 0
  • Average comments per pull request: 2.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • henrykironde (17)
  • Aakash3101 (3)
  • kkothari2001 (3)
  • ethanwhite (2)
  • Nageshbansal (2)
  • dikwickley (1)
  • bw4sz (1)
  • ha0ye (1)
Pull Request Authors
  • henrykironde (42)
  • dassaniansh (13)
  • Aakash3101 (9)
  • Nageshbansal (6)
  • dikwickley (4)
  • kkothari2001 (2)
  • dependabot[bot] (2)
  • Khush2040 (2)
  • Luckysteve007 (2)
  • apeksha235 (1)
  • jainamritanshu (1)
  • bloemenk (1)
  • pri1311 (1)
  • PatriceJada (1)
  • pyther-hub (1)
Top Labels
Issue Labels
getting-started (11) Dataset Request (6) Spatial-data (1) Feature Request (1)
Pull Request Labels
Under Review and Tests (2) dependencies (2) ready (1) Changes Requested (1)

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 818 last-month
  • Total docker downloads: 2,029
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 4
    (may contain duplicates)
  • Total versions: 20
  • Total maintainers: 2
pypi.org: retriever

Data Retriever

  • Versions: 12
  • Dependent Packages: 0
  • Dependent Repositories: 3
  • Downloads: 818 Last month
  • Docker Downloads: 2,029
Rankings
Docker downloads count: 1.1%
Stargazers count: 3.7%
Downloads: 3.9%
Forks count: 4.2%
Average: 4.9%
Dependent packages count: 7.3%
Dependent repos count: 9.1%
Maintainers (2)
Last synced: 4 months ago
conda-forge.org: retriever

This module analyzes jpeg/jpeg2000/png/gif image header and return image size.

  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 1
Rankings
Forks count: 15.6%
Stargazers count: 22.8%
Dependent repos count: 24.1%
Average: 28.5%
Dependent packages count: 51.5%
Last synced: 4 months ago

Dependencies

requirements.txt pypi
  • Pillow *
  • PyMySQL >=0.4
  • argcomplete *
  • coverage *
  • future *
  • h5py *
  • inquirer *
  • kaggle *
  • numpydoc *
  • openpyxl *
  • pandas *
  • psycopg2-binary *
  • requests *
  • setuptools *
  • sphinx_py3doc_enhanced_theme *
  • sphinx_rtd_theme *
  • sphinxcontrib-napoleon *
  • tables *
  • tqdm ==4.30.0
  • xlrd >=0.7
setup.py pypi
  • Pillow *
  • argcomplete *
  • future *
  • h5py *
  • kaggle *
  • pandas *
  • requests *
  • tqdm *
  • xlrd *
.github/workflows/docker-publish.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • crazy-max/ghaction-docker-meta v1 composite
  • docker/build-push-action ad44023a93711e3deb337508980b4b5e9bcdc5dc composite
  • docker/login-action 28218f9b04b4f3f62068d7b6ce6ca5b26e35336c composite
.github/workflows/publish-to-test-pypi.yml actions
  • actions/checkout master composite
  • actions/setup-python v1 composite
  • pypa/gh-action-pypi-publish master composite
.github/workflows/python-package.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • codecov/codecov-action v1 composite
  • huaxk/postgis-action v1 composite
  • mysql 5.7 docker
docker/Dockerfile docker
  • osgeo/gdal latest build
docker-compose.yml docker
  • kartoza/postgis latest
  • mysql 5.7
  • ret_image latest