kerchunk

Cloud-friendly access to archival data

https://github.com/fsspec/kerchunk

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    12 of 67 committers (17.9%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.0%) to scientific vocabulary

Keywords

hacktoberfest python

Keywords from Contributors

zarr pangeo pydata closember climate geospatial-data climate-data meteorology docs xarray-accessor
Last synced: 6 months ago · JSON representation

Repository

Cloud-friendly access to archival data

Basic Info
Statistics
  • Stars: 349
  • Watchers: 19
  • Forks: 94
  • Open Issues: 131
  • Releases: 0
Topics
hacktoberfest python
Created over 5 years ago · Last pushed 7 months ago
Metadata Files
Readme License

README.md

kerchunk

Cloud-friendly access to archival data

Docs Tests Pypi Conda-forge

Kerchunk is a library that provides a unified way to represent a variety of chunked, compressed data formats (e.g. NetCDF, HDF5, GRIB), allowing efficient access to the data from traditional file systems or cloud object storage. It also provides a flexible way to create virtual datasets from multiple files. It does this by extracting the byte ranges, compression information and other information about the data and storing this metadata in a new, separate object. This means that you can create a virtual aggregate dataset over potentially many source files, for efficient, parallel and cloud-friendly in-situ access without having to copy or translate the originals. It is a gateway to in-the-cloud massive data processing while the data providers still insist on using legacy formats for archival storage.

Why Kerchunk:

We provide the following things:

  • completely serverless architecture
  • metadata consolidation, so you can understand a many-file dataset (metadata plus physical storage) in a single read
  • read from all of the storage backends supported by fsspec, including object storage (s3, gcs, abfs, alibaba), http, cloud user storage (dropbox, gdrive) and network protocols (ftp, ssh, hdfs, smb...)
  • loading of various file types (currently netcdf4/HDF, grib2, tiff, fits, zarr), potentially heterogeneous within a single dataset, without a need to go via the specific driver (e.g., no need for h5py)
  • asynchronous concurrent fetch of many data chunks in one go, amortizing the cost of latency
  • parallel access with a library like zarr without any locks
  • logical datasets viewing many (>~millions) data files, and direct access/subselection to them via coordinate indexing across an arbitrary number of dimensions

logo

For further information, please see the documentation.

Owner

  • Name: python filesystem spec
  • Login: fsspec
  • Kind: organization

data storage IO layer for python

GitHub Events

Total
  • Issues event: 26
  • Watch event: 36
  • Delete event: 1
  • Issue comment event: 242
  • Push event: 29
  • Pull request event: 39
  • Pull request review event: 11
  • Pull request review comment event: 13
  • Fork event: 14
  • Create event: 3
Last Year
  • Issues event: 26
  • Watch event: 36
  • Delete event: 1
  • Issue comment event: 242
  • Push event: 29
  • Pull request event: 39
  • Pull request review event: 11
  • Pull request review comment event: 13
  • Fork event: 14
  • Create event: 3

Committers

Last synced: 6 months ago

All Time
  • Total Commits: 742
  • Total Committers: 67
  • Avg Commits per committer: 11.075
  • Development Distribution Score (DDS): 0.643
Past Year
  • Commits: 43
  • Committers: 18
  • Avg Commits per committer: 2.389
  • Development Distribution Score (DDS): 0.814
Top Committers
Name Email Commits
Martin Durant m****t@a****a 265
Martin Durant m****t@u****a 65
Lucas Sterzinger l****r@u****u 53
Anu-Ra-g n****8@g****m 42
Martin Durant m****t@u****m 35
Rich Signell r****l@u****v 31
Kevin Paul k****l@n****m 20
Max Jones 1****s@u****m 18
dcherian d****k@c****t 17
peterm790 p****0@g****m 17
David Stuebe d****d@c****y 17
Alex Goodman a****m@u****m 17
Raphael Hagen n****n@g****m 15
Lucas Sterzinger l****s@u****u 13
Matt Iannucci m****i@r****m 10
Lawson Woods l****2@a****u 7
Ray Bell r****l@d****m 7
RichardScottOZ 7****Z@u****m 6
Ben Mares s****1@t****m 6
Aleksandar Jelenak a****k@g****m 4
David Stuebe 8****d@u****m 4
Pier Lorenzo Marasco p****o@u****m 4
Robert Banick r****k@g****m 3
Ray Bell r****0@g****m 3
Pete Gadomski p****i@g****m 3
Kelton Halbert k****t@w****u 3
Josef Kellndorfer j****r@u****m 3
Ian Thomas i****3@g****m 3
Aimee Barciauskas a****e@d****g 2
Ben Dichter b****r@g****m 2
and 37 more...

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 170
  • Total pull requests: 230
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 20 days
  • Total issue authors: 92
  • Total pull request authors: 46
  • Average comments per issue: 6.85
  • Average comments per pull request: 3.3
  • Merged pull requests: 168
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 26
  • Pull requests: 67
  • Average time to close issues: 23 days
  • Average time to close pull requests: 12 days
  • Issue authors: 23
  • Pull request authors: 20
  • Average comments per issue: 1.19
  • Average comments per pull request: 2.69
  • Merged pull requests: 44
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • TomNicholas (9)
  • rsignell (9)
  • martindurant (8)
  • rsignell-usgs (7)
  • raybellwaves (6)
  • keltonhalbert (4)
  • ivirshup (3)
  • tinaok (3)
  • ashiklom (3)
  • maxrjones (3)
  • norlandrhagen (3)
  • sreesanjeevkg (3)
  • emfdavid (3)
  • rabernat (3)
  • pl-marasco (3)
Pull Request Authors
  • martindurant (71)
  • Anu-Ra-g (33)
  • emfdavid (15)
  • norlandrhagen (12)
  • maxrjones (9)
  • raybellwaves (7)
  • maresb (6)
  • kmpaul (6)
  • jhamman (5)
  • mpiannucci (5)
  • ghidalgo3 (4)
  • agoodm (3)
  • mannreis (2)
  • rabernat (2)
  • 777arc (2)
Top Labels
Issue Labels
GSoC-2024 (1)
Pull Request Labels
GSoC-2024 (15)

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 58,888 last-month
  • Total docker downloads: 369
  • Total dependent packages: 11
    (may contain duplicates)
  • Total dependent repositories: 26
    (may contain duplicates)
  • Total versions: 23
  • Total maintainers: 1
pypi.org: kerchunk

Functions to make reference descriptions for ReferenceFileSystem

  • Versions: 18
  • Dependent Packages: 10
  • Dependent Repositories: 13
  • Downloads: 58,888 Last month
  • Docker Downloads: 369
Rankings
Dependent packages count: 1.4%
Downloads: 2.1%
Average: 2.6%
Docker downloads count: 2.9%
Dependent repos count: 4.0%
Maintainers (1)
Last synced: 6 months ago
conda-forge.org: kerchunk
  • Versions: 5
  • Dependent Packages: 1
  • Dependent Repositories: 13
Rankings
Dependent repos count: 9.8%
Average: 22.5%
Forks count: 24.9%
Stargazers count: 26.2%
Dependent packages count: 29.0%
Last synced: 6 months ago

Dependencies

binder/environment.yml conda
  • dask
  • dask-gateway
  • dask-labextension
  • fsspec
  • intake
  • intake-xarray
  • ipywidgets
  • pip
  • s3fs
  • xarray
  • zarr
docs/requirements.txt pypi
  • autodoc *
  • fsspec *
  • h5py *
  • numcodecs *
  • numpy *
  • numpydoc ==1.2.1
  • sphinx-rtd-theme *
  • ujson *
  • xarray *
  • zarr *
requirements-dev.txt pypi
  • cftime *
  • dask *
  • h5netcdf *
  • h5py *
  • jinja2 *
  • mypy *
  • pytest *
  • s3fs *
  • types-ujson *
  • xarray *
requirements.txt pypi
  • fsspec *
  • numcodecs *
  • numpy *
  • ujson *
  • zarr *
.github/workflows/default.yml actions
  • actions/checkout v2 composite
  • ad-m/github-push-action master composite
  • ammaraskar/sphinx-action master composite
.github/workflows/pre-commit.yml actions
  • actions/checkout v3.1.0 composite
  • actions/setup-python v4 composite
  • pre-commit/action v3.0.0 composite
.github/workflows/pull_request.yml actions
  • actions/checkout v2 composite
  • ammaraskar/sphinx-action master composite
.github/workflows/tests.yml actions
  • actions/checkout v2 composite
  • mamba-org/provision-with-micromamba main composite