datahugger
One downloader for many scientific data and code repositories! DOI :open_hands: Data
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 15 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
1 of 6 committers (16.7%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.0%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
One downloader for many scientific data and code repositories! DOI :open_hands: Data
Basic Info
- Host: GitHub
- Owner: J535D165
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://J535D165.github.io/datahugger/
- Size: 3.63 MB
Statistics
- Stars: 76
- Watchers: 4
- Forks: 11
- Open Issues: 16
- Releases: 0
Topics
Metadata Files
README.md
Datahugger - Where DOI :open_hands: Data
Datahugger is a tool to download scientific datasets, software, and code from a large number of repositories based on their DOI (wiki) or URL. With Datahugger, you can automate the downloading of data and improve the reproducibility of your research. Datahugger provides a straightforward Python interface as well as an intuitive Command Line Interface (CLI).
Supported repositories
Datahugger offers support for more than <!-- count -->377<!-- count --> generic and specific (scientific) repositories (and more to come!).
We are still expanding Datahugger with support for more repositories. You can help by requesting support for a repository in the issue tracker. Pull Requests are very welcome as well.
Installation
Datahugger requires Python 3.6 or later.
pip install datahugger
Getting started
Datahugger with Python
Load a dataset (or any digital asset) from a repository with the
datahugger.get() function. The first argument is the DOI or URL,
and the second is the folder name to store the dataset (it will be
created if it does not exist).
The following code loads dataset 10.5061/dryad.mj8m0 into
the folder data.
```python import datahugger
download the dataset to the folder "data"
datahugger.get("10.5061/dryad.mj8m0", "data") ```
For an example of how this can integrate with your work, see the
example workflow notebook or
Datahugger with command line
The command line function datahugger provides an easy interface to download data. The first
argument is the DOI or URL, and the second argument is the name of the folder to store the dataset (will be
created if it does not exist).
bash
datahugger 10.5061/dryad.mj8m0 data
bash
% datahugger 10.5061/dryad.mj8m0 data
Collecting...
NestTemperatureData.csv : 100%|████████████████████████████████████████| 607k/607k
README_for_NestTemperatureData.txt : 100%|██████████████████████████████████████| 2.82k/2.82k
ExternalTemps.csv : 100%|██████████████████████████████████████| 1.06k/1.06k
README_for_ExternalTemps.txt : 100%|██████████████████████████████████████| 2.82k/2.82k
InternalEggTempData.csv : 100%|██████████████████████████████████████████| 664/664
README_for_InternalEggTempData.txt : 100%|██████████████████████████████████████| 2.82k/2.82k
SoilSimulation_Output.csv : 100%|████████████████████████████████████████| 229M/229M
README_for_SoilSimulation_[...].txt: 100%|██████████████████████████████████████| 2.82k/2.82k
Dataset successfully downloaded.
Tip: On some systems, you have to quote the DOI or URL. For example: datahugger "10.5061/dryad.mj8m0" data.
Tips and tricks
- No need to struggle with DOIs versus DOI URLs. They both work (and more). Example: The values
10.5061/dryad.x3ffbg7m8,doi:10.5061/dryad.x3ffbg7m8,https://doi.org/10.5061/dryad.x3ffbg7m8, andhttps://datadryad.org/stash/dataset/doi:10.5061/dryad.x3ffbg7m8all point to the same dataset. - Do not republish the dataset when uploading your data to a scientific data repository. These storage resources can be used better :)
Contact
Please feel free to reach out with questions, comments, and suggestions. The issue tracker is a good starting point. You can also email me at jonathandebruinos@gmail.com.
Owner
- Name: Jonathan de Bruin
- Login: J535D165
- Kind: user
- Location: Netherlands
- Company: Utrecht University
- Repositories: 45
- Profile: https://github.com/J535D165
Research engineer working on software, datasets, and tools to advance open science 👐 @UtrechtUniversity @asreview
Citation (CITATION.cff)
cff-version: 1.2.0
title: >-
Datahugger - One downloader for many scientific
repositories
message: 'If you use this software, please cite it as instructed.'
type: software
authors:
- family-names: De Bruin
given-names: Jonathan
orcid: 'https://orcid.org/0000-0002-4297-0502'
repository-code: 'https://github.com/J535D165/datahugger'
url: 'https://github.com/J535D165/datahugger'
repository-artifact: 'https://pypi.org/project/datahugger/'
abstract: >-
Datahugger is a tool to download scientific datasets,
software, and code from a large number of repositories
based on their DOI (wiki) or URL. With Datahugger, you can
automate the downloading of data and improve the
reproducibility of your research. Datahugger provides a
straightforward Python interface and an intuitive Command
Line Interface (CLI).
license: MIT
GitHub Events
Total
- Issues event: 4
- Watch event: 17
- Delete event: 2
- Issue comment event: 5
- Push event: 59
- Pull request review event: 4
- Pull request review comment event: 7
- Pull request event: 6
- Fork event: 1
- Create event: 1
Last Year
- Issues event: 4
- Watch event: 17
- Delete event: 2
- Issue comment event: 5
- Push event: 59
- Pull request review event: 4
- Pull request review comment event: 7
- Pull request event: 6
- Fork event: 1
- Create event: 1
Committers
Last synced: almost 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Jonathan de Bruin | j****s@g****m | 139 |
| PeterLombaers | 7****s | 2 |
| Kian-Meng Ang | k****g@g****m | 1 |
| Ahmad Hesam | a****m@c****h | 1 |
| Jelle Teijema | j****a@g****m | 1 |
| Dave Tromp | 2****p | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 25
- Total pull requests: 80
- Average time to close issues: 2 months
- Average time to close pull requests: 15 days
- Total issue authors: 13
- Total pull request authors: 11
- Average comments per issue: 1.24
- Average comments per pull request: 0.44
- Merged pull requests: 72
- Bot issues: 0
- Bot pull requests: 5
Past Year
- Issues: 7
- Pull requests: 14
- Average time to close issues: 9 days
- Average time to close pull requests: about 2 months
- Issue authors: 6
- Pull request authors: 5
- Average comments per issue: 0.57
- Average comments per pull request: 1.29
- Merged pull requests: 8
- Bot issues: 0
- Bot pull requests: 3
Top Authors
Issue Authors
- J535D165 (9)
- IgnacioHeredia (3)
- alvarolopez (3)
- davetromp (1)
- kirbyju (1)
- Danny-dK (1)
- nichtich (1)
- micafer (1)
- EmiliaJarochowska (1)
- cheginit (1)
- kingjr (1)
- XNN19 (1)
- girgink (1)
Pull Request Authors
- J535D165 (58)
- micafer (11)
- pre-commit-ci[bot] (8)
- davetromp (5)
- stsnel (2)
- PeterLombaers (2)
- jteijema (1)
- Senui (1)
- kianmeng (1)
- alvarolopez (1)
- SexyVetra (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 4,848 last-month
- Total docker downloads: 47
- Total dependent packages: 1
- Total dependent repositories: 2
- Total versions: 28
- Total maintainers: 1
pypi.org: datahugger
One downloader for many scientific data and code repositories!
- Documentation: https://datahugger.readthedocs.io/
- License: MIT
-
Latest release: 0.10.4
published about 2 years ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v3 composite
- actions/setup-python v1 composite
- actions/checkout v3 composite
- actions/setup-python v3 composite
- actions/checkout v3 composite
- actions/setup-python v3 composite
- pypa/gh-action-pypi-publish release/v1 composite
- jsonpath_ng *
- natsort *
- requests *
- scitree *
- tqdm *
- actions/checkout v3 composite
- actions/setup-python v3 composite
- tubone24/update_release v1.3.1 composite
- pandas *
- requests *
- tabulate *
