https://github.com/dahnj/awesome-zarr

🎀 Awesome Zarr resources

https://github.com/dahnj/awesome-zarr

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • â—‹
    CITATION.cff file
  • ✓
    codemeta.json file
    Found codemeta.json file
  • ✓
    .zenodo.json file
    Found .zenodo.json file
  • â—‹
    DOI references
  • ✓
    Academic publication links
    Links to: arxiv.org, ieee.org, zenodo.org
  • â—‹
    Committers with academic emails
  • â—‹
    Institutional organization owner
  • â—‹
    JOSS paper metadata
  • â—‹
    Scientific vocabulary similarity
    Low similarity (8.4%) to scientific vocabulary

Keywords

array awesome awesome-list data data-format zarr
Last synced: 5 months ago · JSON representation

Repository

🎀 Awesome Zarr resources

Basic Info
  • Host: GitHub
  • Owner: DahnJ
  • License: cc0-1.0
  • Default Branch: main
  • Homepage:
  • Size: 201 KB
Statistics
  • Stars: 92
  • Watchers: 5
  • Forks: 1
  • Open Issues: 1
  • Releases: 0
Topics
array awesome awesome-list data data-format zarr
Created almost 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Contributing License

README.md

Zarr

Awesome

drawing

Zarr is a cloud-native, chunked, compressed, and hierarchical array data format.

Contents

Resources - Existing resources - Introductory videos - Zarr V3 - Libraries - Platforms - Articles - Talks & Videos - Life sciences

Topics - Zarr & other array data formats - GeoZarr - Zarr & STAC

Resources

Existing resources

The Zarr website is already an excellent resource for learning about Zarr and its ecosystem. This list is intended to complement the website with a curated and opinionated list of resources.

This list focuses on Geo/Earth Sciences, but is not limited to that domain.

Existing lists

Lists - The Zarr website already contains great lists: Zarr Implementations, Zarr Datasets, Zarr metadata conventions - Zarr tutorials (zarr-developers/tutorials) - Projects using Zarr (zarr-developers/community#19) - Beautiful Zarr (zarr-developers/beautiful-zarr) - See playlists & lists in Talks & Videos

Introductory videos

Introductory talks Youtube playlist

Two excellent and up-to-date introductory talks: - Sanket Verma: The Beauty of Zarr - Ryan Abernathey: State of Zarr

Zarr V3

Zarr V3 is the upcoming version of Zarr. It is a major update that will bring many new features and improvements.

If you're getting into Zarr now, it might be a good idea to start with Zarr V3. - Zarr-Python 3 and why you should be excited!

For an excellent in-depth overview, see the ESIP series of talks - 2023-03-27 ESIP Cloud Computing Cluster: Zarr - The Next Generation - 2023-04-24 ESIP Cloud Computing Cluster: Next Generation of Zarr Part 2/3 GeoZarr and Zarr Sharding - 2023-05-22 ESIP CCC: Next Gen Zarr Part 3/3: accumulation proposal, Kerchunk and Pangeo-Forge

Libraries

This list contains libraries that directly relate to Zarr in some way.

For implementations of Zarr, see Zarr Implementations. - kerchunk, see kerchunk section - xpublish: Exposing as and consuming Zarr through a REST API - See also routers at xpublish-community, e.g. xpublish-opendap - Improving Access to NOAA NOS Model Data with Kerchunk and Xpublish - ndpyramid: utility for generating ND array pyramids using Xarray and Zarr

Storage & I/O - Tensorstore and xarray-tensorstore: library for efficiently reading and writing large multi-dimensional arrays, has Zarr API - KivkIO: C++ and Python bindings to cuFile, enabling GPUDirect Storage - rechunker: disk-to-disk transformation for chunked arrays - xpartition: writing large xarray datasets to Zarr. Works around shortcomings of Dask (distributed#6360)

ETL - Xarray: Zarr is commonly written and accessed through xarray's API. - Xarray has its own Zarr Encoding Specification - xarray-beam: Integration of xarray and Apache Beam built using Zarr. - Pangeo-forge: Open-source data platform for transforming datasets into analysis-ready cloud-optimized formats. - See Pangeo Forge in 4 minutes and Pangeo Forge: Crowdsourcing Open Data in the Cloud- Ryan Abernathey | SciPy 2022

Developer-oriented - numcodecs: Compression and transformation codecs used by Zarr - pydantic-zarr: Pydantic models for Zarr objects - traverzarr: Traversing Zarr JSON as if it's a filesystem - zarr_checksum: Calculating checksum information form Zarr - zarrdump: Describe zarr stores from the command line

Visualization: For tools & libraries for visualization, see visualization section

Kerchunk

Kerchunk allows you to efficiently read chunked data formats such as GRID, NetCDF, COGs by exposing them as a Zarr store.

Talks and tutorials - All you need is Zarr - 2022 ESIP Kerchunk Tutorial - Accessing NetCDF and GRIB file collections as cloud-native virtual datasets using Kerchunk

Future of Kerchunk

In the future, Kerchunk will be split into upstream functionality in Zarr itself and a new VirtualiZarr package. - Kerchunk JSON references will become a part of the Chunk manifest - For a full overview, see Upstreaming Kerchunk - What's Next for Kerchunk

Platforms

  • Arraylake: a data lake platform based on Zarr. The company, Earthmover was started by core Zarr developers.

Articles

Talks & Videos

Existing lists - Zarr Developers playlists, namely - Zarr: Introductory Talks - Zarr: Zarr: Projects, Uses, Research and Workflows - Zarr Talks - Introductory videos in this list

Talks - Earthmover Webinar: Building a Planetary Scale Earth Observation Data Cube in Zarr with code repository and slides - Earthmover Webinar: Analysis-ready Weather Forecast Data Cubes with Zarr with code repository and slides - Presentation | Zarr: Community specification of large, cloud-optimised, N-dimensional, typed array storage - Presentations for Sanket Verma's talks: SciPy 2023 and PyCon DE 2023

Life sciences

Zarr has seen great adoption in the life sciences domain.

  • bdz: Zarr-based format for storing quantitative biosystems dynamics data
  • ome-zarr-py: Implementation of next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.
  • ez_zarr: Easy, high-level access to OME-Zarr filesets
  • hdmf-zarr: Zarr I/O backend for HDMF

Talks and resources - Zarr | Life Science Lightning Talk | Trevor Manz | Dask Summit 2021 - Accelerating Single-cell Bioinformatics with N-dimensional Arrays in the Cloud | ISMMS - What are next-generation file formats (NGFF)?

Visualization

Zarr has seen most work on visualization in the bioimaging community: - List: Image viewers with OME-Zarr support - WEBKNOSSOS: web-based visualization & annotation tool, supports OME-Zarr - Napari: interactive viewer - Vizarr: interactive viewer built using viv (OME-Zarr and OME-TIFF) - Neuroglancer: WebGL-based viewer for volumetric data - BigDataViewer

Topics

Zarr & other array data formats

For a general overview, see - Introduction to Cloud-Native Geospatial Formats - Cloud-Optimized Geospatial Formats Guide.

Essentially all other common array data formats can be exposed as Zarr. See Kerchunk.

NetCDF & HDF5

Zarr, NetCDF, and HDF5 are three separate data formats that nonetheless relate to each other in multiple ways. - Zarr inherits its hierarchical structure from HDF5. - Zarr is commonly accessed through xarray, whose data models are based on the NetCDF data format - NetCDF4 can use HDF5 as a backend - NCZarr is an extension of the Zarr format to map it to a subset of the NetCDF data model.

Resources - A Comparison of HDF5, Zarr, and netCDF4 in Performing Common I/O Operations HDF5 - Pangeo: HDF5 at the speed of Zarr - Joe Jevnik: Zarr vs. HDF5 | PyData New York 2019

COG: Cloud-Optimized GeoTIFF

N5

Zarr and N5 are two similar array data formats that share common goals and development.

The Zarr V3 spec aims to provide a common implementation target (sources: 1, 2)

Links - n5 - zarr.n5 - z5: C++ and Python interface for datasets in zarr and n5 format - Zarr N5 spec diff (zarr-specs#3)

GeoZarr

GeoZarr is a proposal for a Zarr-based geospatial data format, being submitted as an OGC standard

GeoZarr will define a metadata convention for Zarr stores that contain geospatial data.

It will also define the relationship of Zarr with CF and NetCDF

Links - Specs - Current status of GeoZarr

Zarr & STAC

STAC provides a common structure for describing and cataloging spatiotemporal assets.

With its hierarchical structure and key-value metadata support, Zarr's capabilities overlap significantly with STAC.

The communities have not yet converged on a canonical representation of Zarr datasets through STAC.

Today, a good example of exposing Zarr in STAC is Planetary Computer - Reading Zarr Data - STAC collection: Daymet Annual North America - STAC collection: CIL Global Downscaled Projections for Climate Impacts Research - xstac: STAC from xarray - Related STAC extensions: xarray-assets, datacube

More discussion & Related links - Pangeo: Metadata duplication on STAC zarr collections - geozarr-spec#32: Integration of Zarr with STAC Catalogs - stac-spec#781: Zarr Extension? - Tom Augspurper: STAC and Kerchunk - Presentation | Daniel Jahn – STAC vs Zarr - Arraylake a data lake platform that is arguably the first example of a pure Zarr data catalog

In the future, the Zarr V3 Spec and GeoZarr convention will likely enable greater interoperability between STAC and Zarr.

Owner

  • Name: Daniel Jahn (dahn)
  • Login: DahnJ
  • Kind: user
  • Location: null island (epsg:3068)
  • Company: @SylveraIO

If it has coordinates, then I'm up for it

GitHub Events

Total
  • Watch event: 12
Last Year
  • Watch event: 12

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 12
  • Total Committers: 1
  • Avg Commits per committer: 12.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 6
  • Committers: 1
  • Avg Commits per committer: 6.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
DahnJ d****n@g****m 12

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • joshmoore (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels