zeitcache
Stupid-fast functional-flavored caching for xarray pipelines
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary
Keywords
Repository
Stupid-fast functional-flavored caching for xarray pipelines
Basic Info
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
zeitcache
Stupid-fast functional-flavored caching for xarray pipelines
Introduction
zeitcache is a wrapper function for xarray methods that can automatically create and restore precomputed results for those methods, saving computing resources. It is especially useful for the following workflows:
- Reproducible scientific computing code, so you can write idiomatic code without having to worry about performance
- Rapid development, where there's no time for something more complicated like snakemake or prefect
- Improving performance of preexisting code, as zeitcache fits in transparently
- Situations where expensive reductions are commonplace in the code
If you have a DataArray, an immutable function to apply to it, and want to cut down on compute in the simplest way possible, zeitcache might be the library for you. It is similar to joblib, but significantly more optimized for scientific computing workflows and takes advantage of some unique advantages of xarray to make that possible.
Utilization
Simply take a call like this: ```python from zeitcache import zeitcache
dataset = dataset.mean(dims=('lat', 'lon', 'time'))
And rewrite it as this:
python
@zeitcache("mydataset")
def reductionsimple(ds):
return ds.mean(dims=('lat', 'lon', 'time'))
dataset = reductionsimple(dataset)
Just like that, you now have automatic caching. You can also do something more imperative, if that's your style:
python
def reductionsimple(ds):
return ds.mean(dims=('lat', 'lon', 'time'))
dataset = zeitforce("mydataset", dataset, reductionsimple)
Alternatively, if you'd prefer not to do the caching immediately, or want to map functions onto thunks later on (maybe functional programming is more your style), you can use `zeitdelay` to do that:
python
datasetthunk = zeitdelay("mydataset", dataset)
some time later
def someexpensivefunction(ds): ... result = datasetthunk(someexpensive_function) ``` Do note that this makes your code harder to read.
Important: you must remember to give each dataset a unique name, otherwise you risk collision! Also, zeitcache's hashing algorithm doesn't actually check the data itself but rather its structure in order to make a hash. This works if and only if you make each name unique!
Please see the docstrings for more information on how to use each function.
Future Work
These are roughly ordered from most to least important. - Add type hints throughout the code (will look ugly, but useful) - Allow users to pass an alternative hashing function - Ship a not-O(1) hashing function as an alternative - Warn users if there are two datasets with the same name but different hashes in the directory (so they aren't accidentally duping data) - Make the code even lazier internally - Support more types of compression algorithms for different needs
The Name
In German, "zeit" means time, and "cache" is the same thing as in English. That's what this software usually does: it caches stuff to save you time. A native speaker could also read it as "Zeitkasse", which means something like "time checkout" or "time cash register", and that's fitting too, since the cached data are things you can withdraw from later to save on time.
License
This code is MIT licensed. Please follow the terms of that license. Also, if you end up using this in published work, please cite it. Even though it's small, attribution helps justify continued development. See CITATION.cff for details.
Owner
- Name: Nathaniel Flores
- Login: nsflores1
- Kind: user
- Location: USA
- Repositories: 1
- Profile: https://github.com/nsflores1
Citation (CITATION.cff)
cff-version: 1.2.0
title: zeitcache
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Nathaniel
family-names: Flores
email: nsf1@williams.edu
affiliation: Williams College
repository-code: 'https://github.com/nsflores1/zeitcache'
abstract: Simple functional caching for xarray pipelines
license: MIT
GitHub Events
Total
- Public event: 1
- Push event: 2
Last Year
- Public event: 1
- Push event: 2
Dependencies
- numpy *
- scipy *
- xarray *
- zstandard *