datahugger

One downloader for many scientific data and code repositories! DOI :open_hands: Data

https://github.com/j535d165/datahugger

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 15 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    1 of 6 committers (16.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.0%) to scientific vocabulary

Keywords

cli data datacite dataone dataverse dryad figshare github mendeley-data python rdm repository research research-data-management science scientific scientific-data utrecht-university zenodo

Keywords from Contributors

directory-lister
Last synced: 4 months ago · JSON representation ·

Repository

One downloader for many scientific data and code repositories! DOI :open_hands: Data

Basic Info
Statistics
  • Stars: 76
  • Watchers: 4
  • Forks: 11
  • Open Issues: 16
  • Releases: 0
Topics
cli data datacite dataone dataverse dryad figshare github mendeley-data python rdm repository research research-data-management science scientific scientific-data utrecht-university zenodo
Created over 3 years ago · Last pushed 4 months ago
Metadata Files
Readme License Citation

README.md

Datahugger - Where DOI hugs Data

Datahugger - Where DOI :open_hands: Data

Datahugger is a tool to download scientific datasets, software, and code from a large number of repositories based on their DOI (wiki) or URL. With Datahugger, you can automate the downloading of data and improve the reproducibility of your research. Datahugger provides a straightforward Python interface as well as an intuitive Command Line Interface (CLI).

Supported repositories

Datahugger offers support for more than <!-- count -->377<!-- count --> generic and specific (scientific) repositories (and more to come!).

Datahugger support Zenodo, Dataverse, DataOne, GitHub, FigShare, HuggingFace, Mendeley Data, Dryad, OSF, and many more

We are still expanding Datahugger with support for more repositories. You can help by requesting support for a repository in the issue tracker. Pull Requests are very welcome as well.

Installation

PyPI

Datahugger requires Python 3.6 or later.

pip install datahugger

Getting started

Datahugger with Python

Load a dataset (or any digital asset) from a repository with the datahugger.get() function. The first argument is the DOI or URL, and the second is the folder name to store the dataset (it will be created if it does not exist).

The following code loads dataset 10.5061/dryad.mj8m0 into the folder data.

```python import datahugger

download the dataset to the folder "data"

datahugger.get("10.5061/dryad.mj8m0", "data") ```

For an example of how this can integrate with your work, see the example workflow notebook or Open In Colab

Datahugger with command line

The command line function datahugger provides an easy interface to download data. The first argument is the DOI or URL, and the second argument is the name of the folder to store the dataset (will be created if it does not exist).

bash datahugger 10.5061/dryad.mj8m0 data

bash % datahugger 10.5061/dryad.mj8m0 data Collecting... NestTemperatureData.csv : 100%|████████████████████████████████████████| 607k/607k README_for_NestTemperatureData.txt : 100%|██████████████████████████████████████| 2.82k/2.82k ExternalTemps.csv : 100%|██████████████████████████████████████| 1.06k/1.06k README_for_ExternalTemps.txt : 100%|██████████████████████████████████████| 2.82k/2.82k InternalEggTempData.csv : 100%|██████████████████████████████████████████| 664/664 README_for_InternalEggTempData.txt : 100%|██████████████████████████████████████| 2.82k/2.82k SoilSimulation_Output.csv : 100%|████████████████████████████████████████| 229M/229M README_for_SoilSimulation_[...].txt: 100%|██████████████████████████████████████| 2.82k/2.82k Dataset successfully downloaded.

Tip: On some systems, you have to quote the DOI or URL. For example: datahugger "10.5061/dryad.mj8m0" data.

Tips and tricks

Contact

Please feel free to reach out with questions, comments, and suggestions. The issue tracker is a good starting point. You can also email me at jonathandebruinos@gmail.com.

Owner

  • Name: Jonathan de Bruin
  • Login: J535D165
  • Kind: user
  • Location: Netherlands
  • Company: Utrecht University

Research engineer working on software, datasets, and tools to advance open science 👐 @UtrechtUniversity @asreview

Citation (CITATION.cff)

cff-version: 1.2.0
title: >-
  Datahugger - One downloader for many scientific
  repositories
message: 'If you use this software, please cite it as instructed.'
type: software
authors:
  - family-names: De Bruin
    given-names: Jonathan
    orcid: 'https://orcid.org/0000-0002-4297-0502'
repository-code: 'https://github.com/J535D165/datahugger'
url: 'https://github.com/J535D165/datahugger'
repository-artifact: 'https://pypi.org/project/datahugger/'
abstract: >-
  Datahugger is a tool to download scientific datasets,
  software, and code from a large number of repositories
  based on their DOI (wiki) or URL. With Datahugger, you can
  automate the downloading of data and improve the
  reproducibility of your research. Datahugger provides a
  straightforward Python interface and an intuitive Command
  Line Interface (CLI).
license: MIT

GitHub Events

Total
  • Issues event: 4
  • Watch event: 17
  • Delete event: 2
  • Issue comment event: 5
  • Push event: 59
  • Pull request review event: 4
  • Pull request review comment event: 7
  • Pull request event: 6
  • Fork event: 1
  • Create event: 1
Last Year
  • Issues event: 4
  • Watch event: 17
  • Delete event: 2
  • Issue comment event: 5
  • Push event: 59
  • Pull request review event: 4
  • Pull request review comment event: 7
  • Pull request event: 6
  • Fork event: 1
  • Create event: 1

Committers

Last synced: almost 2 years ago

All Time
  • Total Commits: 145
  • Total Committers: 6
  • Avg Commits per committer: 24.167
  • Development Distribution Score (DDS): 0.041
Past Year
  • Commits: 109
  • Committers: 6
  • Avg Commits per committer: 18.167
  • Development Distribution Score (DDS): 0.055
Top Committers
Name Email Commits
Jonathan de Bruin j****s@g****m 139
PeterLombaers 7****s 2
Kian-Meng Ang k****g@g****m 1
Ahmad Hesam a****m@c****h 1
Jelle Teijema j****a@g****m 1
Dave Tromp 2****p 1
Committer Domains (Top 20 + Academic)
cern.ch: 1

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 25
  • Total pull requests: 80
  • Average time to close issues: 2 months
  • Average time to close pull requests: 15 days
  • Total issue authors: 13
  • Total pull request authors: 11
  • Average comments per issue: 1.24
  • Average comments per pull request: 0.44
  • Merged pull requests: 72
  • Bot issues: 0
  • Bot pull requests: 5
Past Year
  • Issues: 7
  • Pull requests: 14
  • Average time to close issues: 9 days
  • Average time to close pull requests: about 2 months
  • Issue authors: 6
  • Pull request authors: 5
  • Average comments per issue: 0.57
  • Average comments per pull request: 1.29
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 3
Top Authors
Issue Authors
  • J535D165 (9)
  • IgnacioHeredia (3)
  • alvarolopez (3)
  • davetromp (1)
  • kirbyju (1)
  • Danny-dK (1)
  • nichtich (1)
  • micafer (1)
  • EmiliaJarochowska (1)
  • cheginit (1)
  • kingjr (1)
  • XNN19 (1)
  • girgink (1)
Pull Request Authors
  • J535D165 (58)
  • micafer (11)
  • pre-commit-ci[bot] (8)
  • davetromp (5)
  • stsnel (2)
  • PeterLombaers (2)
  • jteijema (1)
  • Senui (1)
  • kianmeng (1)
  • alvarolopez (1)
  • SexyVetra (1)
Top Labels
Issue Labels
enhancement (11) help wanted (10)
Pull Request Labels
enhancement (5) bug (5) new repository (2) documentation (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 4,848 last-month
  • Total docker downloads: 47
  • Total dependent packages: 1
  • Total dependent repositories: 2
  • Total versions: 28
  • Total maintainers: 1
pypi.org: datahugger

One downloader for many scientific data and code repositories!

  • Versions: 28
  • Dependent Packages: 1
  • Dependent Repositories: 2
  • Downloads: 4,848 Last month
  • Docker Downloads: 47
Rankings
Downloads: 2.8%
Docker downloads count: 3.1%
Dependent packages count: 4.8%
Average: 5.6%
Dependent repos count: 11.5%
Maintainers (1)
Last synced: 4 months ago

Dependencies

.github/workflows/gh-pages.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/python-lint.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v1 composite
.github/workflows/python-package.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
.github/workflows/python-publish.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • pypa/gh-action-pypi-publish release/v1 composite
pyproject.toml pypi
  • jsonpath_ng *
  • natsort *
  • requests *
  • scitree *
  • tqdm *
.github/workflows/benchmark.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • tubone24/update_release v1.3.1 composite
benchmark/requirements.txt pypi
  • pandas *
  • requests *
  • tabulate *