shardedstore

Provides a sharded Zarr store

https://github.com/thewtex/shardedstore

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    1 of 2 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.1%) to scientific vocabulary

Keywords

python sharding zarr
Last synced: 6 months ago · JSON representation ·

Repository

Provides a sharded Zarr store

Basic Info
  • Host: GitHub
  • Owner: thewtex
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 34.2 KB
Statistics
  • Stars: 4
  • Watchers: 4
  • Forks: 2
  • Open Issues: 3
  • Releases: 5
Topics
python sharding zarr
Created almost 4 years ago · Last pushed over 3 years ago
Metadata Files
Readme License Citation

README.md

shardedstore

image Test DOI

Provides a sharded Zarr store.

Features

  • For large Zarr stores, avoid an excessive number of objects or extremely large objects, which bypasses filesystem inode usage and object store limitations.
  • Performance-sensitive implementation.
  • Use existing Zarr v2 stores.
  • Mix and match shard store types.
  • Serialize and deserialize the ShardedStore in JSON.
  • Shard groups or array chunks.
  • Easily run transformations on store shards.

Installation

sh pip install shardedstore

Example

```python from shardedstore import ShardedStore, arraysharddirectorystore, tozipstorewith_prefix

from zarr.storage import DirectoryStore

xarray example, but works with zarr in general

import xarray as xr from datatree import DataTree, open_datatree import json import numpy as np import os ```

Create component shard stores

python base_store = DirectoryStore("base.zarr") shard1 = DirectoryStore("shard1.zarr") shard2 = DirectoryStore("shard2.zarr") array_shards1 = array_shard_directory_store("array_shards1") array_shards2 = array_shard_directory_store("array_shards2")

Generate data for the example

```python

xarray-datatree Quick Overview

data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]})

Sharded array dimensions must have a chunk shape of 1.

data = data.chunk([1,2]) ds = xr.Dataset(dict(foo=data, bar=("x", [1, 2]), baz=np.pi)) ds2 = ds.interp(coords={"x": [10, 12, 14, 16, 18, 20]}) ds2 = ds2.chunk({'x':1, 'y':2}) ds3 = xr.Dataset( dict(people=["alice", "bob"], heights=("people", [1.57, 1.82])), coords={"species": "human"}, ) dt = DataTree.from_dict({"simulation/coarse": ds, "simulation/fine": ds2, "/": ds3}) ```

A monolithic store

python single_store = DirectoryStore("single.zarr") dt.to_zarr(single_store)

A sharded store demonstrating sharding on groups and arrays.

Arrays are sharded over 1 dimension.

python sharded_store = ShardedStore(base_store, {'people': shard1, 'species': shard2}, {'simulation/coarse/foo': (1, array_shards1), 'simulation/fine/foo': (1, array_shards2)}) dt.to_zarr(sharded_store)

Serialize / deserialize

python config = sharded_store.get_config() config_str = json.dumps(config) config = json.loads(config_str) sharded_store = ShardedStore.from_config(config)

Validate

python from_single = open_datatree(single_store, engine='zarr').compute() from_sharded = open_datatree(sharded_store, engine='zarr').compute() assert from_single.identical(from_sharded)

Run transformations over component shards with map_shards

python to_zip_stores = to_zip_store_with_prefix("zip_stores") zip_sharded_stores = sharded_store.map_shards(to_zip_stores)

Development

Contributions are welcome and appreciated.

git clone https://github.com/thewtex/shardedstore cd shardedstore pip install -e ".[test]" pytest

Owner

  • Name: Matt McCormick
  • Login: thewtex
  • Kind: user
  • Location: Research Triangle Park, NC
  • Company: Fideus Labs

Empowering innovators to extract insights from scientific images.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: 'shardedstore: A sharded store for Zarr'
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Matthew
    family-names: McCormick
    email: matt@mmmccormick.com
    affiliation: 'Kitware, Inc'
    orcid: 'https://orcid.org/0000-0001-9475-3756'
keywords:
  - zarr
  - shard
  - python
license: Apache-2.0

GitHub Events

Total
Last Year

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 40
  • Total Committers: 2
  • Avg Commits per committer: 20.0
  • Development Distribution Score (DDS): 0.025
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Matt McCormick m****k@k****m 39
Tobias Kölling t****g@m****e 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 8
  • Total pull requests: 10
  • Average time to close issues: 7 days
  • Average time to close pull requests: 1 day
  • Total issue authors: 3
  • Total pull request authors: 3
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.3
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • thewtex (6)
  • jstriebel (1)
  • d70-t (1)
Pull Request Authors
  • thewtex (8)
  • asteroidb612 (1)
  • d70-t (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/test.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • mikepenz/action-junit-report v2 composite
pyproject.toml pypi
  • zarr >=2.11.3