Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (17.3%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: deepin-community
- License: bsd-3-clause
- Language: Python
- Default Branch: master
- Size: 209 KB
Statistics
- Stars: 0
- Watchers: 3
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md

Documentation (latest) • Documentation (main branch) • Contributing • Contact
Part of the Fatiando a Terra project
About
Does your Python package include sample datasets? Are you shipping them with the code? Are they getting too big?
Pooch is here to help! It will manage a data registry by downloading your data files from a server only when needed and storing them locally in a data cache (a folder on your computer).
Here are Pooch's main features:
- Pure Python and minimal dependencies.
- Download a file only if necessary (it's not in the data cache or needs to be updated).
- Verify download integrity through SHA256 hashes (also used to check if a file needs to be updated).
- Designed to be extended: plug in custom download (FTP, scp, etc) and post-processing (unzip, decompress, rename) functions.
- Includes utilities to unzip/decompress the data upon download to save loading time.
- Can handle basic HTTP authentication (for servers that require a login) and printing download progress bars.
- Easily set up an environment variable to overwrite the data cache location.
Are you a scientist or researcher? Pooch can help you too!
- Automatically download your data files so you don't have to keep them in your GitHub repository.
- Make sure everyone running the code has the same version of the data files (enforced through the SHA256 hashes).
Example
For a scientist downloading a data file for analysis:
```python import pooch import pandas as pd
Download a file and save it locally, returning the path to it.
Running this again will not cause a download. Pooch will check the hash
(checksum) of the downloaded file against the given value to make sure
it's the right file (not corrupted or outdated).
fnamebathymetry = pooch.retrieve( url="https://github.com/fatiando-data/caribbean-bathymetry/releases/download/v1/caribbean-bathymetry.csv.xz", knownhash="md5:a7332aa6e69c77d49d7fb54b764caa82", )
Pooch can also download based on a DOI from certain providers.
fnamegravity = pooch.retrieve( url="doi:10.5281/zenodo.5882430/southern-africa-gravity.csv.xz", knownhash="md5:1dee324a14e647855366d6eb01a1ef35", )
Load the data with Pandas
databathymetry = pd.readcsv(fnamebathymetry) datagravity = pd.readcsv(fnamegravity) ```
For package developers including sample data in their projects:
```python """ Module mypackage/datasets.py """ import pkg_resources import pandas import pooch
Get the version string from your project. You have one of these, right?
from . import version
Create a new friend to manage your sample data storage
GOODBOY = pooch.create( # Folder where the data will be stored. For a sensible default, use the # default cache folder for your OS. path=pooch.oscache("mypackage"), # Base URL of the remote data store. Will call .format on this string # to insert the version (see below). baseurl="https://github.com/myproject/mypackage/raw/{version}/data/", # Pooches are versioned so that you can use multiple versions of a # package simultaneously. Use PEP440 compliant version number. The # version will be appended to the path. version=version, # If a version as a "+XX.XXXXX" suffix, we'll assume that this is a dev # version and replace the version with this string. versiondev="main", # An environment variable that overwrites the path. env="MYPACKAGEDATADIR", # The cache file registry. A dictionary with all files managed by this # pooch. Keys are the file names (relative to *baseurl*) and values # are their respective SHA256 hashes. Files will be downloaded # automatically when needed (see fetchgravitydata). registry={"gravity-data.csv": "89y10phsdwhs09whljwc09whcowsdhcwodcydw"} )
You can also load the registry from a file. Each line contains a file
name and it's sha256 hash separated by a space. This makes it easier to
manage large numbers of data files. The registry file should be packaged
and distributed with your software.
GOODBOY.loadregistry( pkgresources.resource_stream("mypackage", "registry.txt") )
Define functions that your users can call to get back the data in memory
def fetchgravitydata(): """ Load some sample gravity data to use in your docs. """ # Fetch the path to a file in the local storage. If it's not there, # we'll download it. fname = GOODBOY.fetch("gravity-data.csv") # Load it with numpy/pandas/etc data = pandas.read_csv(fname) return data ```
Projects using Pooch
- SciPy
- scikit-image
- MetPy
- icepack
- histolab
- seaborn-image
- Ensaio
- Open AR-Sandbox
- climlab
- napari
- mne-python
- GemGIS
If you're using Pooch, send us a pull request adding your project to the list.
Getting involved
🗨️ Contact us: Find out more about how to reach us at fatiando.org/contact.
👩🏾💻 Contributing to project development: Please read our Contributing Guide to see how you can help and give feedback.
🧑🏾🤝🧑🏼 Code of conduct: This project is released with a Code of Conduct. By participating in this project you agree to abide by its terms.
Imposter syndrome disclaimer: We want your help. No, really. There may be a little voice inside your head that is telling you that you're not ready, that you aren't skilled enough to contribute. We assure you that the little voice in your head is wrong. Most importantly, there are many valuable ways to contribute besides writing code.
This disclaimer was adapted from the MetPy project.
License
This is free software: you can redistribute it and/or modify it under the terms
of the BSD 3-clause License. A copy of this license is provided in
LICENSE.txt.
Owner
- Name: deepin Community
- Login: deepin-community
- Kind: organization
- Email: support@deepin.org
- Location: China
- Website: https://www.deepin.org/
- Repositories: 8,091
- Profile: https://github.com/deepin-community
Welcome to the deepin community.
Citation (CITATION.cff)
cff-version: 1.2.0
title: 'Pooch: A friend to fetch your data files'
message: >-
If you use this software, please cite it using the
information in this file.
type: software
url: 'https://www.fatiando.org/pooch/'
repository-code: 'https://github.com/fatiando/pooch'
repository-artifact: 'https://pypi.org/project/pooch/'
license: BSD-3-Clause
preferred-citation:
type: article
title: 'Pooch: A friend to fetch your data files'
journal: Journal of Open Source Software
year: 2020
doi: 10.21105/joss.01943
volume: 5
issue: 45
start: 1943
license: CC-BY-4.0
authors:
- given-names: Leonardo
family-names: Uieda
affiliation: University of Liverpool
orcid: 'https://orcid.org/0000-0001-6123-9515'
- given-names: Santiago Rubén
family-names: Soler
affiliation: Universidad Nacional de San Juan
orcid: 'https://orcid.org/0000-0001-9202-5317'
- given-names: Rémi
family-names: Rampin
affiliation: New York University
orcid: 'https://orcid.org/0000-0002-0524-2282'
- given-names: Hugo
name-particle: van
family-names: Kemenade
orcid: 'https://orcid.org/0000-0001-5715-8632'
- given-names: Matthew
family-names: Turk
affiliation: School of Information Sciences
orcid: 'https://orcid.org/0000-0002-5294-0198'
- given-names: Daniel
family-names: Shapero
affiliation: University of Washington
orcid: 'https://orcid.org/0000-0002-3651-0649'
- given-names: Anderson
family-names: Banihirwe
affiliation: National Center for Atmospheric Research
orcid: 'https://orcid.org/0000-0001-6583-571X'
- given-names: John
family-names: Leeman
affiliation: Leeman Geophysical
orcid: 'https://orcid.org/0000-0002-3624-1821'
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: 13 days
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 2.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- Zeno-sole (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- build *
- sphinx ==4.4.
- sphinx-book-theme ==0.2.
- sphinx-panels ==0.6.
- black *
- flake8 *
- pathspec *
- pylint ==2.4.
- coverage * test
- pytest * test
- pytest-cov * test
- pytest-httpserver * test
- pytest-localftpserver * test
- pylint ==2.4.