Pooch

Pooch: A friend to fetch your data files - Published in JOSS (2020)

https://github.com/fatiando/pooch

Science Score: 100.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README and JOSS metadata
  • Academic publication links
  • Committers with academic emails
    2 of 47 committers (4.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

data download-manager fatiando-a-terra ftp http python python3 scipy scipy-stack

Keywords from Contributors

mesh geoscience turing-machine s2s s2d prediction pangeo climate-analysis climate neuroscience
Last synced: 4 months ago · JSON representation ·

Repository

A friend to fetch your data files

Basic Info
Statistics
  • Stars: 684
  • Watchers: 15
  • Forks: 80
  • Open Issues: 52
  • Releases: 28
Topics
data download-manager fatiando-a-terra ftp http python python3 scipy scipy-stack
Created over 7 years ago · Last pushed 4 months ago
Metadata Files
Readme Contributing License Code of conduct Citation Authors

README.md

Pooch: A friend to fetch your data files

Documentation (latest)Documentation (main branch)ContributingContactAsk a question

Part of the Fatiando a Terra project

Latest version on PyPI Latest version on conda-forge Test coverage status Compatible Python versions. DOI used to cite Pooch

About

Just want to download a file without messing with requests and urllib? Trying to add sample datasets to your Python package? Pooch is here to help!

Pooch is a Python library that can manage data by downloading files from a server (only when needed) and storing them locally in a data cache (a folder on your computer).

  • Pure Python and minimal dependencies.
  • Download files over HTTP, FTP, and from data repositories like Zenodo and figshare.
  • Built-in post-processors to unzip/decompress the data after download.
  • Designed to be extended: create custom downloaders and post-processors.

Are you a scientist or researcher? Pooch can help you too!

  • Host your data on a repository and download using the DOI.
  • Automatically download data using code instead of telling colleagues to do it themselves.
  • Make sure everyone running the code has the same version of the data files.

Projects using Pooch

SciPy, scikit-image, xarray, Ensaio, GemPy, MetPy, napari, Satpy, yt, PyVista, icepack, histolab, seaborn-image, Open AR-Sandbox, climlab, mne-python, GemGIS, SHTOOLS, MOABB, GeoViews, ScopeSim, Brainrender, pyxem, cellfinder, PVGeo, geosnap, BioCypher, cf-xarray, Scirpy, rembg, DASCore, scikit-mobility, Py-ART, HyperSpy, RosettaSciIO, eXSpy, SPLASH xclim CLISOPS

If you're using Pooch, send us a pull request adding your project to the list.

Example

For a scientist downloading a data file for analysis:

```python import pooch import pandas as pd

Download a file and save it locally, returning the path to it.

Running this again will not cause a download. Pooch will check the hash

(checksum) of the downloaded file against the given value to make sure

it's the right file (not corrupted or outdated).

fnamebathymetry = pooch.retrieve( url="https://github.com/fatiando-data/caribbean-bathymetry/releases/download/v1/caribbean-bathymetry.csv.xz", knownhash="md5:a7332aa6e69c77d49d7fb54b764caa82", )

Pooch can also download based on a DOI from certain providers.

fnamegravity = pooch.retrieve( url="doi:10.5281/zenodo.5882430/southern-africa-gravity.csv.xz", knownhash="md5:1dee324a14e647855366d6eb01a1ef35", )

Load the data with Pandas

databathymetry = pd.readcsv(fnamebathymetry) datagravity = pd.readcsv(fnamegravity) ```

For package developers including sample data in their projects:

```python """ Module mypackage/datasets.py """ from importlib import resources import pandas import pooch

Get the version string from your project. You have one of these, right?

from . import version

Create a new friend to manage your sample data storage

GOODBOY = pooch.create( # Folder where the data will be stored. For a sensible default, use the # default cache folder for your OS. path=pooch.oscache("mypackage"), # Base URL of the remote data store. Will call .format on this string # to insert the version (see below). baseurl="https://github.com/myproject/mypackage/raw/{version}/data/", # Pooches are versioned so that you can use multiple versions of a # package simultaneously. Use PEP440 compliant version number. The # version will be appended to the path. version=version, # If a version as a "+XX.XXXXX" suffix, we'll assume that this is a dev # version and replace the version with this string. versiondev="main", # An environment variable that overwrites the path. env="MYPACKAGEDATADIR", # The cache file registry. A dictionary with all files managed by this # pooch. Keys are the file names (relative to *baseurl*) and values # are their respective SHA256 hashes. Files will be downloaded # automatically when needed (see fetchgravitydata). registry={"gravity-data.csv": "89y10phsdwhs09whljwc09whcowsdhcwodcydw"} )

You can also load the registry from a file. Each line contains a file

name and it's sha256 hash separated by a space. This makes it easier to

manage large numbers of data files. The registry file should be packaged

and distributed with your software.

GOODBOY.loadregistry( resources.opentext("mypackage", "registry.txt") )

Define functions that your users can call to get back the data in memory

def fetchgravitydata(): """ Load some sample gravity data to use in your docs. """ # Fetch the path to a file in the local storage. If it's not there, # we'll download it. fname = GOODBOY.fetch("gravity-data.csv") # Load it with numpy/pandas/etc data = pandas.read_csv(fname) return data ```

Getting involved

🗨️ Contact us: Find out more about how to reach us at fatiando.org/contact.

👩🏾‍💻 Contributing to project development: Please read our Contributing Guide to see how you can help and give feedback.

🧑🏾‍🤝‍🧑🏼 Code of conduct: This project is released with a Code of Conduct. By participating in this project you agree to abide by its terms.

Imposter syndrome disclaimer: We want your help. No, really. There may be a little voice inside your head that is telling you that you're not ready, that you aren't skilled enough to contribute. We assure you that the little voice in your head is wrong. Most importantly, there are many valuable ways to contribute besides writing code.

This disclaimer was adapted from the MetPy project.

License

This is free software: you can redistribute it and/or modify it under the terms of the BSD 3-clause License. A copy of this license is provided in LICENSE.txt.

Owner

  • Name: Fatiando a Terra
  • Login: fatiando
  • Kind: organization

Open-source Python tools for geophysics

JOSS Publication

Pooch: A friend to fetch your data files
Published
January 17, 2020
Volume 5, Issue 45, Page 1943
Authors
Leonardo Uieda ORCID
Department of Earth, Ocean and Ecological Sciences, School of Environmental Sciences, University of Liverpool, UK
Santiago Rubén Soler ORCID
Instituto Geofísico Sismológico Volponi, Universidad Nacional de San Juan, Argentina, CONICET, Argentina
Rémi Rampin ORCID
New York University, USA
Hugo van Kemenade ORCID
Independent (Non-affiliated)
Matthew Turk ORCID
University of Illinois at Urbana-Champaign, USA
Daniel Shapero ORCID
Polar Science Center, University of Washington Applied Physics Lab, USA
Anderson Banihirwe ORCID
The US National Center for Atmospheric Research, USA
John Leeman ORCID
Leeman Geophysical, USA
Editor
Daniel S. Katz ORCID
Tags
python

Citation (CITATION.cff)

cff-version: 1.2.0
title: 'Pooch: A friend to fetch your data files'
message: >-
  If you use this software, please cite it using the
  information in this file.
type: software
url: 'https://www.fatiando.org/pooch/'
repository-code: 'https://github.com/fatiando/pooch'
repository-artifact: 'https://pypi.org/project/pooch/'
license: BSD-3-Clause
preferred-citation:
  type: article
  title: 'Pooch: A friend to fetch your data files'
  journal: Journal of Open Source Software
  year: 2020
  doi: 10.21105/joss.01943
  volume: 5
  issue: 45
  start: 1943
  license: CC-BY-4.0
  authors:
    - given-names: Leonardo
      family-names: Uieda
      affiliation: University of Liverpool
      orcid: 'https://orcid.org/0000-0001-6123-9515'
    - given-names: Santiago Rubén
      family-names: Soler
      affiliation: Universidad Nacional de San Juan
      orcid: 'https://orcid.org/0000-0001-9202-5317'
    - given-names: Rémi
      family-names: Rampin
      affiliation: New York University
      orcid: 'https://orcid.org/0000-0002-0524-2282'
    - given-names: Hugo
      name-particle: van
      family-names: Kemenade
      orcid: 'https://orcid.org/0000-0001-5715-8632'
    - given-names: Matthew
      family-names: Turk
      affiliation: School of Information Sciences
      orcid: 'https://orcid.org/0000-0002-5294-0198'
    - given-names: Daniel
      family-names: Shapero
      affiliation: University of Washington
      orcid: 'https://orcid.org/0000-0002-3651-0649'
    - given-names: Anderson
      family-names: Banihirwe
      affiliation: National Center for Atmospheric Research
      orcid: 'https://orcid.org/0000-0001-6583-571X'
    - given-names: John
      family-names: Leeman
      affiliation: Leeman Geophysical
      orcid: 'https://orcid.org/0000-0002-3624-1821'

GitHub Events

Total
  • Issues event: 23
  • Watch event: 57
  • Delete event: 11
  • Issue comment event: 65
  • Push event: 41
  • Pull request review comment event: 9
  • Pull request review event: 8
  • Pull request event: 34
  • Fork event: 8
  • Create event: 19
Last Year
  • Issues event: 23
  • Watch event: 57
  • Delete event: 11
  • Issue comment event: 65
  • Push event: 41
  • Pull request review comment event: 9
  • Pull request review event: 8
  • Pull request event: 34
  • Fork event: 8
  • Create event: 19

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 306
  • Total Committers: 47
  • Avg Commits per committer: 6.511
  • Development Distribution Score (DDS): 0.425
Past Year
  • Commits: 19
  • Committers: 7
  • Avg Commits per committer: 2.714
  • Development Distribution Score (DDS): 0.579
Top Committers
Name Email Commits
Leonardo Uieda l****a@g****m 176
Santiago Soler s****r@f****m 32
dependabot[bot] 4****] 17
Fatiando a Terra Bot 5****t 10
Hugo van Kemenade h****k 9
Remi Rampin r****n@g****m 6
Dominic Kempf d****f 4
Mark Harfouche m****e@g****m 4
Antonio Valentino a****o@t****t 3
Daniel Shapero s****l@g****m 3
Rowan Cockett r****1@g****m 2
Juan Nunez-Iglesias j****i@f****m 2
Daniel McCloy d****n@m****o 2
Björn Ludwig b****g@p****e 2
Anderson Banihirwe a****e@u****u 2
Adam Boesky a****y@g****m 1
Agustina p****a@g****m 1
Alessia Marcolini 9****i@g****m 1
Alex Fikl a****l@g****m 1
AlexanderJuestel 4****l 1
Anirudh Dagar a****6@g****m 1
Brian Rose b****e@a****u 1
myd7349 m****9@g****m 1
Zac Flamig z****c@w****m 1
Trevor James Smith 1****e 1
Stephan Hoyer s****r@g****m 1
SarthakJariwala 3****a 1
Sandro s****u@l****m 1
Ryan May r****1@g****m 1
Ryan Abernathey r****y@g****m 1
and 17 more...
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 88
  • Total pull requests: 169
  • Average time to close issues: 3 months
  • Average time to close pull requests: 28 days
  • Total issue authors: 50
  • Total pull request authors: 39
  • Average comments per issue: 2.28
  • Average comments per pull request: 1.56
  • Merged pull requests: 141
  • Bot issues: 1
  • Bot pull requests: 26
Past Year
  • Issues: 19
  • Pull requests: 32
  • Average time to close issues: 12 days
  • Average time to close pull requests: 10 days
  • Issue authors: 17
  • Pull request authors: 11
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.88
  • Merged pull requests: 21
  • Bot issues: 0
  • Bot pull requests: 9
Top Authors
Issue Authors
  • leouieda (18)
  • dokempf (9)
  • santisoler (5)
  • kloczek (4)
  • khaeru (3)
  • remrama (2)
  • mdtanker (2)
  • AKuederle (1)
  • tloredo (1)
  • Mr-Milk (1)
  • dependabot[bot] (1)
  • tomasstolker (1)
  • kelle (1)
  • serasset (1)
  • dstansby (1)
Pull Request Authors
  • leouieda (48)
  • santisoler (46)
  • dependabot[bot] (34)
  • dokempf (6)
  • jni (4)
  • eliotwrobson (3)
  • penguinpee (3)
  • hmaarrfk (3)
  • shoyer (2)
  • drammock (2)
  • jat255 (2)
  • Adam-Boesky (2)
  • Zeitsperre (2)
  • avalentino (2)
  • hugovk (2)
Top Labels
Issue Labels
enhancement (33) bug (23) maintenance (14) documentation (10) question (3) good first issue (1) dependencies (1)
Pull Request Labels
dependencies (34) github_actions (1)

Packages

  • Total packages: 17
  • Total downloads:
    • pypi 6,102,099 last-month
  • Total docker downloads: 343,454,272
  • Total dependent packages: 361
    (may contain duplicates)
  • Total dependent repositories: 2,976
    (may contain duplicates)
  • Total versions: 117
  • Total maintainers: 5
pypi.org: pooch

A friend to fetch your data files

  • Versions: 34
  • Dependent Packages: 298
  • Dependent Repositories: 2,570
  • Downloads: 6,102,099 Last month
  • Docker Downloads: 343,454,272
Rankings
Dependent packages count: 0.1%
Dependent repos count: 0.2%
Downloads: 0.3%
Docker downloads count: 0.3%
Average: 1.5%
Stargazers count: 2.8%
Forks count: 5.4%
Maintainers (2)
Last synced: 4 months ago
alpine-v3.18: py3-pooch

Friend to fetch data files

  • Versions: 1
  • Dependent Packages: 1
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 6.5%
Stargazers count: 11.6%
Forks count: 14.5%
Maintainers (1)
Last synced: 4 months ago
alpine-v3.18: py3-pooch-pyc

Precompiled Python bytecode for py3-pooch

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 6.5%
Stargazers count: 11.6%
Forks count: 14.5%
Maintainers (1)
Last synced: 4 months ago
proxy.golang.org: github.com/fatiando/pooch
  • Versions: 27
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Stargazers count: 2.6%
Forks count: 3.1%
Average: 6.5%
Dependent packages count: 9.6%
Dependent repos count: 10.8%
Last synced: 4 months ago
spack.io: py-pooch

Pooch manages your Python library's sample data files: it automatically downloads and stores them in a local directory, with support for versioning and corruption checks.

  • Versions: 3
  • Dependent Packages: 6
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 8.4%
Average: 8.7%
Stargazers count: 11.5%
Forks count: 14.7%
Maintainers (1)
Last synced: 4 months ago
alpine-edge: py3-pooch

Friend to fetch data files

  • Versions: 9
  • Dependent Packages: 1
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 6.0%
Average: 9.1%
Stargazers count: 14.3%
Forks count: 15.9%
Maintainers (1)
Last synced: 5 months ago
alpine-edge: py3-pooch-pyc

Precompiled Python bytecode for py3-pooch

  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Average: 10.8%
Dependent packages count: 13.4%
Stargazers count: 13.6%
Forks count: 16.2%
Maintainers (1)
Last synced: 5 months ago
conda-forge.org: pooch

Pooch manages your remote data files. It automatically downloads and stores them in a local directory (using HTTP or FTP), with support for versioning, corruption checks, and custom download and post-processing operations.

  • Versions: 23
  • Dependent Packages: 46
  • Dependent Repositories: 203
Rankings
Dependent packages count: 1.5%
Dependent repos count: 2.4%
Average: 11.8%
Stargazers count: 19.7%
Forks count: 23.7%
Last synced: 4 months ago
anaconda.org: pooch

Pooch manages your remote data files. It automatically downloads and stores them in a local directory (using HTTP or FTP), with support for versioning, corruption checks, and custom download and post-processing operations.

  • Versions: 4
  • Dependent Packages: 9
  • Dependent Repositories: 203
Rankings
Dependent packages count: 4.9%
Dependent repos count: 13.6%
Average: 22.1%
Stargazers count: 33.2%
Forks count: 36.8%
Last synced: 4 months ago
alpine-v3.20: py3-pooch

Friend to fetch data files

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Last synced: 4 months ago
alpine-v3.19: py3-pooch-pyc

Precompiled Python bytecode for py3-pooch

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Last synced: 4 months ago
alpine-v3.22: py3-pooch-pyc

Precompiled Python bytecode for py3-pooch

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Maintainers (1)
Last synced: 4 months ago
alpine-v3.21: py3-pooch-pyc

Precompiled Python bytecode for py3-pooch

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Last synced: 4 months ago
alpine-v3.21: py3-pooch

Friend to fetch data files

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Last synced: 4 months ago
alpine-v3.22: py3-pooch

Friend to fetch data files

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Maintainers (1)
Last synced: 4 months ago
alpine-v3.20: py3-pooch-pyc

Precompiled Python bytecode for py3-pooch

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Last synced: 4 months ago
alpine-v3.19: py3-pooch

Friend to fetch data files

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Last synced: 4 months ago

Dependencies

env/requirements-build.txt pypi
  • build *
env/requirements-docs.txt pypi
  • sphinx ==4.4.
  • sphinx-book-theme ==0.2.
  • sphinx-panels ==0.6.
env/requirements-style.txt pypi
  • black *
  • flake8 *
  • pathspec *
  • pylint ==2.4.
env/requirements-test.txt pypi
  • coverage * test
  • pytest * test
  • pytest-cov * test
  • pytest-localftpserver * test
.github/workflows/docs.yml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • actions/checkout 5a4ac9002d0be2fb38bd78e4b4dbde5606d7042f composite
  • actions/download-artifact v3 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
  • styfle/cancel-workflow-action 148d9a848c6acaf90a3ec30bc5062f646f8a4163 composite
.github/workflows/pypi.yml actions
  • actions/checkout v3 composite
  • actions/download-artifact v3 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
  • pypa/gh-action-pypi-publish bce3b74dbf8cc32833ffba9d15f83425c1a736e0 composite
.github/workflows/style.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/test.yml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • codecov/codecov-action v3 composite
  • styfle/cancel-workflow-action 148d9a848c6acaf90a3ec30bc5062f646f8a4163 composite
pyproject.toml pypi
environment.yml pypi
  • burocrata *