Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
12 of 67 committers (17.9%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.0%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Cloud-friendly access to archival data
Basic Info
- Host: GitHub
- Owner: fsspec
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://fsspec.github.io/kerchunk/
- Size: 115 MB
Statistics
- Stars: 349
- Watchers: 19
- Forks: 94
- Open Issues: 131
- Releases: 0
Topics
Metadata Files
README.md
kerchunk
Cloud-friendly access to archival data
Kerchunk is a library that provides a unified way to represent a variety of chunked, compressed data formats (e.g. NetCDF, HDF5, GRIB), allowing efficient access to the data from traditional file systems or cloud object storage. It also provides a flexible way to create virtual datasets from multiple files. It does this by extracting the byte ranges, compression information and other information about the data and storing this metadata in a new, separate object. This means that you can create a virtual aggregate dataset over potentially many source files, for efficient, parallel and cloud-friendly in-situ access without having to copy or translate the originals. It is a gateway to in-the-cloud massive data processing while the data providers still insist on using legacy formats for archival storage.
Why Kerchunk:
We provide the following things:
- completely serverless architecture
- metadata consolidation, so you can understand a many-file dataset (metadata plus physical storage) in a single read
- read from all of the storage backends supported by fsspec, including object storage (s3, gcs, abfs, alibaba), http, cloud user storage (dropbox, gdrive) and network protocols (ftp, ssh, hdfs, smb...)
- loading of various file types (currently netcdf4/HDF, grib2, tiff, fits, zarr), potentially heterogeneous within a single dataset, without a need to go via the specific driver (e.g., no need for h5py)
- asynchronous concurrent fetch of many data chunks in one go, amortizing the cost of latency
- parallel access with a library like zarr without any locks
- logical datasets viewing many (>~millions) data files, and direct access/subselection to them via coordinate indexing across an arbitrary number of dimensions

For further information, please see the documentation.
Owner
- Name: python filesystem spec
- Login: fsspec
- Kind: organization
- Repositories: 21
- Profile: https://github.com/fsspec
data storage IO layer for python
GitHub Events
Total
- Issues event: 26
- Watch event: 36
- Delete event: 1
- Issue comment event: 242
- Push event: 29
- Pull request event: 39
- Pull request review event: 11
- Pull request review comment event: 13
- Fork event: 14
- Create event: 3
Last Year
- Issues event: 26
- Watch event: 36
- Delete event: 1
- Issue comment event: 242
- Push event: 29
- Pull request event: 39
- Pull request review event: 11
- Pull request review comment event: 13
- Fork event: 14
- Create event: 3
Committers
Last synced: 6 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Martin Durant | m****t@a****a | 265 |
| Martin Durant | m****t@u****a | 65 |
| Lucas Sterzinger | l****r@u****u | 53 |
| Anu-Ra-g | n****8@g****m | 42 |
| Martin Durant | m****t@u****m | 35 |
| Rich Signell | r****l@u****v | 31 |
| Kevin Paul | k****l@n****m | 20 |
| Max Jones | 1****s@u****m | 18 |
| dcherian | d****k@c****t | 17 |
| peterm790 | p****0@g****m | 17 |
| David Stuebe | d****d@c****y | 17 |
| Alex Goodman | a****m@u****m | 17 |
| Raphael Hagen | n****n@g****m | 15 |
| Lucas Sterzinger | l****s@u****u | 13 |
| Matt Iannucci | m****i@r****m | 10 |
| Lawson Woods | l****2@a****u | 7 |
| Ray Bell | r****l@d****m | 7 |
| RichardScottOZ | 7****Z@u****m | 6 |
| Ben Mares | s****1@t****m | 6 |
| Aleksandar Jelenak | a****k@g****m | 4 |
| David Stuebe | 8****d@u****m | 4 |
| Pier Lorenzo Marasco | p****o@u****m | 4 |
| Robert Banick | r****k@g****m | 3 |
| Ray Bell | r****0@g****m | 3 |
| Pete Gadomski | p****i@g****m | 3 |
| Kelton Halbert | k****t@w****u | 3 |
| Josef Kellndorfer | j****r@u****m | 3 |
| Ian Thomas | i****3@g****m | 3 |
| Aimee Barciauskas | a****e@d****g | 2 |
| Ben Dichter | b****r@g****m | 2 |
| and 37 more... | ||
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 170
- Total pull requests: 230
- Average time to close issues: about 1 month
- Average time to close pull requests: 20 days
- Total issue authors: 92
- Total pull request authors: 46
- Average comments per issue: 6.85
- Average comments per pull request: 3.3
- Merged pull requests: 168
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 26
- Pull requests: 67
- Average time to close issues: 23 days
- Average time to close pull requests: 12 days
- Issue authors: 23
- Pull request authors: 20
- Average comments per issue: 1.19
- Average comments per pull request: 2.69
- Merged pull requests: 44
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- TomNicholas (9)
- rsignell (9)
- martindurant (8)
- rsignell-usgs (7)
- raybellwaves (6)
- keltonhalbert (4)
- ivirshup (3)
- tinaok (3)
- ashiklom (3)
- maxrjones (3)
- norlandrhagen (3)
- sreesanjeevkg (3)
- emfdavid (3)
- rabernat (3)
- pl-marasco (3)
Pull Request Authors
- martindurant (71)
- Anu-Ra-g (33)
- emfdavid (15)
- norlandrhagen (12)
- maxrjones (9)
- raybellwaves (7)
- maresb (6)
- kmpaul (6)
- jhamman (5)
- mpiannucci (5)
- ghidalgo3 (4)
- agoodm (3)
- mannreis (2)
- rabernat (2)
- 777arc (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- pypi 58,888 last-month
- Total docker downloads: 369
-
Total dependent packages: 11
(may contain duplicates) -
Total dependent repositories: 26
(may contain duplicates) - Total versions: 23
- Total maintainers: 1
pypi.org: kerchunk
Functions to make reference descriptions for ReferenceFileSystem
- Documentation: https://fsspec.github.io/kerchunk
- License: MIT
-
Latest release: 0.2.9
published 7 months ago
Rankings
Maintainers (1)
conda-forge.org: kerchunk
- Homepage: https://github.com/fsspec/kerchunk
- License: MIT
-
Latest release: 0.0.9
published over 3 years ago
Rankings
Dependencies
- dask
- dask-gateway
- dask-labextension
- fsspec
- intake
- intake-xarray
- ipywidgets
- pip
- s3fs
- xarray
- zarr
- autodoc *
- fsspec *
- h5py *
- numcodecs *
- numpy *
- numpydoc ==1.2.1
- sphinx-rtd-theme *
- ujson *
- xarray *
- zarr *
- cftime *
- dask *
- h5netcdf *
- h5py *
- jinja2 *
- mypy *
- pytest *
- s3fs *
- types-ujson *
- xarray *
- fsspec *
- numcodecs *
- numpy *
- ujson *
- zarr *
- actions/checkout v2 composite
- ad-m/github-push-action master composite
- ammaraskar/sphinx-action master composite
- actions/checkout v3.1.0 composite
- actions/setup-python v4 composite
- pre-commit/action v3.0.0 composite
- actions/checkout v2 composite
- ammaraskar/sphinx-action master composite
- actions/checkout v2 composite
- mamba-org/provision-with-micromamba main composite