https://github.com/dahnj/awesome-zarr
🎀 Awesome Zarr resources
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
â—‹CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
â—‹DOI references
-
✓Academic publication links
Links to: arxiv.org, ieee.org, zenodo.org -
â—‹Committers with academic emails
-
â—‹Institutional organization owner
-
â—‹JOSS paper metadata
-
â—‹Scientific vocabulary similarity
Low similarity (8.4%) to scientific vocabulary
Keywords
Repository
🎀 Awesome Zarr resources
Statistics
- Stars: 92
- Watchers: 5
- Forks: 1
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.md
Zarr

Zarr is a cloud-native, chunked, compressed, and hierarchical array data format.
Contents
Resources - Existing resources - Introductory videos - Zarr V3 - Libraries - Platforms - Articles - Talks & Videos - Life sciences
Topics - Zarr & other array data formats - GeoZarr - Zarr & STAC
Resources
Existing resources
The Zarr website is already an excellent resource for learning about Zarr and its ecosystem. This list is intended to complement the website with a curated and opinionated list of resources.
This list focuses on Geo/Earth Sciences, but is not limited to that domain.
Existing lists
Lists - The Zarr website already contains great lists: Zarr Implementations, Zarr Datasets, Zarr metadata conventions - Zarr tutorials (zarr-developers/tutorials) - Projects using Zarr (zarr-developers/community#19) - Beautiful Zarr (zarr-developers/beautiful-zarr) - See playlists & lists in Talks & Videos
Introductory videos
Introductory talks Youtube playlist
Two excellent and up-to-date introductory talks: - Sanket Verma: The Beauty of Zarr - Ryan Abernathey: State of Zarr
Zarr V3
Zarr V3 is the upcoming version of Zarr. It is a major update that will bring many new features and improvements.
If you're getting into Zarr now, it might be a good idea to start with Zarr V3. - Zarr-Python 3 and why you should be excited!
For an excellent in-depth overview, see the ESIP series of talks - 2023-03-27 ESIP Cloud Computing Cluster: Zarr - The Next Generation - 2023-04-24 ESIP Cloud Computing Cluster: Next Generation of Zarr Part 2/3 GeoZarr and Zarr Sharding - 2023-05-22 ESIP CCC: Next Gen Zarr Part 3/3: accumulation proposal, Kerchunk and Pangeo-Forge
Libraries
This list contains libraries that directly relate to Zarr in some way.
For implementations of Zarr, see Zarr Implementations. - kerchunk, see kerchunk section - xpublish: Exposing as and consuming Zarr through a REST API - See also routers at xpublish-community, e.g. xpublish-opendap - Improving Access to NOAA NOS Model Data with Kerchunk and Xpublish - ndpyramid: utility for generating ND array pyramids using Xarray and Zarr
Storage & I/O - Tensorstore and xarray-tensorstore: library for efficiently reading and writing large multi-dimensional arrays, has Zarr API - KivkIO: C++ and Python bindings to cuFile, enabling GPUDirect Storage - rechunker: disk-to-disk transformation for chunked arrays - xpartition: writing large xarray datasets to Zarr. Works around shortcomings of Dask (distributed#6360)
ETL - Xarray: Zarr is commonly written and accessed through xarray's API. - Xarray has its own Zarr Encoding Specification - xarray-beam: Integration of xarray and Apache Beam built using Zarr. - Pangeo-forge: Open-source data platform for transforming datasets into analysis-ready cloud-optimized formats. - See Pangeo Forge in 4 minutes and Pangeo Forge: Crowdsourcing Open Data in the Cloud- Ryan Abernathey | SciPy 2022
Developer-oriented - numcodecs: Compression and transformation codecs used by Zarr - pydantic-zarr: Pydantic models for Zarr objects - traverzarr: Traversing Zarr JSON as if it's a filesystem - zarr_checksum: Calculating checksum information form Zarr - zarrdump: Describe zarr stores from the command line
Visualization: For tools & libraries for visualization, see visualization section
Kerchunk
Kerchunk allows you to efficiently read chunked data formats such as GRID, NetCDF, COGs by exposing them as a Zarr store.
Talks and tutorials - All you need is Zarr - 2022 ESIP Kerchunk Tutorial - Accessing NetCDF and GRIB file collections as cloud-native virtual datasets using Kerchunk
Future of Kerchunk
In the future, Kerchunk will be split into upstream functionality in Zarr itself and a new VirtualiZarr package. - Kerchunk JSON references will become a part of the Chunk manifest - For a full overview, see Upstreaming Kerchunk - What's Next for Kerchunk
Platforms
- Arraylake: a data lake platform based on Zarr. The company, Earthmover was started by core Zarr developers.
Articles
- NASA IMPACT: Zarr Visualization Report
- Earthmover: cloud-native data loaders for machine learning using zarr and xarray
- Zarr Sprint Recap relevant overviews
Talks & Videos
Existing lists - Zarr Developers playlists, namely - Zarr: Introductory Talks - Zarr: Zarr: Projects, Uses, Research and Workflows - Zarr Talks - Introductory videos in this list
Talks - Earthmover Webinar: Building a Planetary Scale Earth Observation Data Cube in Zarr with code repository and slides - Earthmover Webinar: Analysis-ready Weather Forecast Data Cubes with Zarr with code repository and slides - Presentation | Zarr: Community specification of large, cloud-optimised, N-dimensional, typed array storage - Presentations for Sanket Verma's talks: SciPy 2023 and PyCon DE 2023
Life sciences
Zarr has seen great adoption in the life sciences domain.
- bdz: Zarr-based format for storing quantitative biosystems dynamics data
- ome-zarr-py: Implementation of next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.
- ez_zarr: Easy, high-level access to OME-Zarr filesets
- hdmf-zarr: Zarr I/O backend for HDMF
Talks and resources - Zarr | Life Science Lightning Talk | Trevor Manz | Dask Summit 2021 - Accelerating Single-cell Bioinformatics with N-dimensional Arrays in the Cloud | ISMMS - What are next-generation file formats (NGFF)?
Visualization
Zarr has seen most work on visualization in the bioimaging community: - List: Image viewers with OME-Zarr support - WEBKNOSSOS: web-based visualization & annotation tool, supports OME-Zarr - Napari: interactive viewer - Vizarr: interactive viewer built using viv (OME-Zarr and OME-TIFF) - Neuroglancer: WebGL-based viewer for volumetric data - BigDataViewer
Topics
Zarr & other array data formats
For a general overview, see - Introduction to Cloud-Native Geospatial Formats - Cloud-Optimized Geospatial Formats Guide.
Essentially all other common array data formats can be exposed as Zarr. See Kerchunk.
NetCDF & HDF5
Zarr, NetCDF, and HDF5 are three separate data formats that nonetheless relate to each other in multiple ways. - Zarr inherits its hierarchical structure from HDF5. - Zarr is commonly accessed through xarray, whose data models are based on the NetCDF data format - NetCDF4 can use HDF5 as a backend - NCZarr is an extension of the Zarr format to map it to a subset of the NetCDF data model.
Resources - A Comparison of HDF5, Zarr, and netCDF4 in Performing Common I/O Operations HDF5 - Pangeo: HDF5 at the speed of Zarr - Joe Jevnik: Zarr vs. HDF5 | PyData New York 2019
COG: Cloud-Optimized GeoTIFF
N5
Zarr and N5 are two similar array data formats that share common goals and development.
The Zarr V3 spec aims to provide a common implementation target (sources: 1, 2)
Links - n5 - zarr.n5 - z5: C++ and Python interface for datasets in zarr and n5 format - Zarr N5 spec diff (zarr-specs#3)
GeoZarr
GeoZarr is a proposal for a Zarr-based geospatial data format, being submitted as an OGC standard
GeoZarr will define a metadata convention for Zarr stores that contain geospatial data.
It will also define the relationship of Zarr with CF and NetCDF
Links - Specs - Current status of GeoZarr
Zarr & STAC
STAC provides a common structure for describing and cataloging spatiotemporal assets.
With its hierarchical structure and key-value metadata support, Zarr's capabilities overlap significantly with STAC.
The communities have not yet converged on a canonical representation of Zarr datasets through STAC.
Today, a good example of exposing Zarr in STAC is Planetary Computer - Reading Zarr Data - STAC collection: Daymet Annual North America - STAC collection: CIL Global Downscaled Projections for Climate Impacts Research - xstac: STAC from xarray - Related STAC extensions: xarray-assets, datacube
More discussion & Related links - Pangeo: Metadata duplication on STAC zarr collections - geozarr-spec#32: Integration of Zarr with STAC Catalogs - stac-spec#781: Zarr Extension? - Tom Augspurper: STAC and Kerchunk - Presentation | Daniel Jahn – STAC vs Zarr - Arraylake a data lake platform that is arguably the first example of a pure Zarr data catalog
In the future, the Zarr V3 Spec and GeoZarr convention will likely enable greater interoperability between STAC and Zarr.
Owner
- Name: Daniel Jahn (dahn)
- Login: DahnJ
- Kind: user
- Location: null island (epsg:3068)
- Company: @SylveraIO
- Twitter: dahnjahn
- Repositories: 5
- Profile: https://github.com/DahnJ
If it has coordinates, then I'm up for it
GitHub Events
Total
- Watch event: 12
Last Year
- Watch event: 12
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 1.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 1.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- joshmoore (1)