Science Score: 64.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
✓Committers with academic emails
1 of 2 committers (50.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.1%) to scientific vocabulary
Keywords
Repository
Provides a sharded Zarr store
Basic Info
Statistics
- Stars: 4
- Watchers: 4
- Forks: 2
- Open Issues: 3
- Releases: 5
Topics
Metadata Files
README.md
shardedstore
Provides a sharded Zarr store.
Features
- For large Zarr stores, avoid an excessive number of objects or extremely large objects, which bypasses filesystem inode usage and object store limitations.
- Performance-sensitive implementation.
- Use existing Zarr v2 stores.
- Mix and match shard store types.
- Serialize and deserialize the ShardedStore in JSON.
- Shard groups or array chunks.
- Easily run transformations on store shards.
Installation
sh
pip install shardedstore
Example
```python from shardedstore import ShardedStore, arraysharddirectorystore, tozipstorewith_prefix
from zarr.storage import DirectoryStore
xarray example, but works with zarr in general
import xarray as xr from datatree import DataTree, open_datatree import json import numpy as np import os ```
Create component shard stores
python
base_store = DirectoryStore("base.zarr")
shard1 = DirectoryStore("shard1.zarr")
shard2 = DirectoryStore("shard2.zarr")
array_shards1 = array_shard_directory_store("array_shards1")
array_shards2 = array_shard_directory_store("array_shards2")
Generate data for the example
```python
xarray-datatree Quick Overview
data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]})
Sharded array dimensions must have a chunk shape of 1.
data = data.chunk([1,2]) ds = xr.Dataset(dict(foo=data, bar=("x", [1, 2]), baz=np.pi)) ds2 = ds.interp(coords={"x": [10, 12, 14, 16, 18, 20]}) ds2 = ds2.chunk({'x':1, 'y':2}) ds3 = xr.Dataset( dict(people=["alice", "bob"], heights=("people", [1.57, 1.82])), coords={"species": "human"}, ) dt = DataTree.from_dict({"simulation/coarse": ds, "simulation/fine": ds2, "/": ds3}) ```
A monolithic store
python
single_store = DirectoryStore("single.zarr")
dt.to_zarr(single_store)
A sharded store demonstrating sharding on groups and arrays.
Arrays are sharded over 1 dimension.
python
sharded_store = ShardedStore(base_store,
{'people': shard1, 'species': shard2},
{'simulation/coarse/foo': (1, array_shards1), 'simulation/fine/foo': (1, array_shards2)})
dt.to_zarr(sharded_store)
Serialize / deserialize
python
config = sharded_store.get_config()
config_str = json.dumps(config)
config = json.loads(config_str)
sharded_store = ShardedStore.from_config(config)
Validate
python
from_single = open_datatree(single_store, engine='zarr').compute()
from_sharded = open_datatree(sharded_store, engine='zarr').compute()
assert from_single.identical(from_sharded)
Run transformations over component shards with map_shards
python
to_zip_stores = to_zip_store_with_prefix("zip_stores")
zip_sharded_stores = sharded_store.map_shards(to_zip_stores)
Development
Contributions are welcome and appreciated.
git clone https://github.com/thewtex/shardedstore
cd shardedstore
pip install -e ".[test]"
pytest
Owner
- Name: Matt McCormick
- Login: thewtex
- Kind: user
- Location: Research Triangle Park, NC
- Company: Fideus Labs
- Website: https://fideus.io
- Repositories: 571
- Profile: https://github.com/thewtex
Empowering innovators to extract insights from scientific images.
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: 'shardedstore: A sharded store for Zarr'
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Matthew
family-names: McCormick
email: matt@mmmccormick.com
affiliation: 'Kitware, Inc'
orcid: 'https://orcid.org/0000-0001-9475-3756'
keywords:
- zarr
- shard
- python
license: Apache-2.0
GitHub Events
Total
Last Year
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Matt McCormick | m****k@k****m | 39 |
| Tobias Kölling | t****g@m****e | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 8
- Total pull requests: 10
- Average time to close issues: 7 days
- Average time to close pull requests: 1 day
- Total issue authors: 3
- Total pull request authors: 3
- Average comments per issue: 1.0
- Average comments per pull request: 0.3
- Merged pull requests: 9
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- thewtex (6)
- jstriebel (1)
- d70-t (1)
Pull Request Authors
- thewtex (8)
- asteroidb612 (1)
- d70-t (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v2 composite
- actions/setup-python v2 composite
- mikepenz/action-junit-report v2 composite
- zarr >=2.11.3