zeitcache

Stupid-fast functional-flavored caching for xarray pipelines

https://github.com/nsflores1/zeitcache

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.6%) to scientific vocabulary

Keywords

python xarray
Last synced: 6 months ago · JSON representation ·

Repository

Stupid-fast functional-flavored caching for xarray pipelines

Basic Info
  • Host: GitHub
  • Owner: nsflores1
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 15.6 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
python xarray
Created 7 months ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

zeitcache

Stupid-fast functional-flavored caching for xarray pipelines

Introduction

zeitcache is a wrapper function for xarray methods that can automatically create and restore precomputed results for those methods, saving computing resources. It is especially useful for the following workflows: - Reproducible scientific computing code, so you can write idiomatic code without having to worry about performance - Rapid development, where there's no time for something more complicated like snakemake or prefect - Improving performance of preexisting code, as zeitcache fits in transparently - Situations where expensive reductions are commonplace in the code

If you have a DataArray, an immutable function to apply to it, and want to cut down on compute in the simplest way possible, zeitcache might be the library for you. It is similar to joblib, but significantly more optimized for scientific computing workflows and takes advantage of some unique advantages of xarray to make that possible.

Utilization

Simply take a call like this: ```python from zeitcache import zeitcache

dataset = dataset.mean(dims=('lat', 'lon', 'time')) And rewrite it as this: python @zeitcache("mydataset") def reductionsimple(ds): return ds.mean(dims=('lat', 'lon', 'time'))

dataset = reductionsimple(dataset) Just like that, you now have automatic caching. You can also do something more imperative, if that's your style: python def reductionsimple(ds): return ds.mean(dims=('lat', 'lon', 'time'))

dataset = zeitforce("mydataset", dataset, reductionsimple) Alternatively, if you'd prefer not to do the caching immediately, or want to map functions onto thunks later on (maybe functional programming is more your style), you can use `zeitdelay` to do that: python datasetthunk = zeitdelay("mydataset", dataset)

some time later

def someexpensivefunction(ds): ... result = datasetthunk(someexpensive_function) ``` Do note that this makes your code harder to read.

Important: you must remember to give each dataset a unique name, otherwise you risk collision! Also, zeitcache's hashing algorithm doesn't actually check the data itself but rather its structure in order to make a hash. This works if and only if you make each name unique!

Please see the docstrings for more information on how to use each function.

Future Work

These are roughly ordered from most to least important. - Add type hints throughout the code (will look ugly, but useful) - Allow users to pass an alternative hashing function - Ship a not-O(1) hashing function as an alternative - Warn users if there are two datasets with the same name but different hashes in the directory (so they aren't accidentally duping data) - Make the code even lazier internally - Support more types of compression algorithms for different needs

The Name

In German, "zeit" means time, and "cache" is the same thing as in English. That's what this software usually does: it caches stuff to save you time. A native speaker could also read it as "Zeitkasse", which means something like "time checkout" or "time cash register", and that's fitting too, since the cached data are things you can withdraw from later to save on time.

License

This code is MIT licensed. Please follow the terms of that license. Also, if you end up using this in published work, please cite it. Even though it's small, attribution helps justify continued development. See CITATION.cff for details.

Owner

  • Name: Nathaniel Flores
  • Login: nsflores1
  • Kind: user
  • Location: USA

Citation (CITATION.cff)

cff-version: 1.2.0
title: zeitcache
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Nathaniel
    family-names: Flores
    email: nsf1@williams.edu
    affiliation: Williams College
repository-code: 'https://github.com/nsflores1/zeitcache'
abstract: Simple functional caching for xarray pipelines
license: MIT

GitHub Events

Total
  • Public event: 1
  • Push event: 2
Last Year
  • Public event: 1
  • Push event: 2

Dependencies

pyproject.toml pypi
  • numpy *
  • scipy *
  • xarray *
  • zstandard *