https://github.com/predict-idlab/tsdownsample

High-performance time series downsampling algorithms for visualization

https://github.com/predict-idlab/tsdownsample

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, acm.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.4%) to scientific vocabulary

Keywords

aggregation downsampling fast fpcs lttb m4 minmax performance python simd time-series visualization
Last synced: 5 months ago · JSON representation

Repository

High-performance time series downsampling algorithms for visualization

Basic Info
  • Host: GitHub
  • Owner: predict-idlab
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 641 KB
Statistics
  • Stars: 193
  • Watchers: 9
  • Forks: 18
  • Open Issues: 17
  • Releases: 5
Topics
aggregation downsampling fast fpcs lttb m4 minmax performance python simd time-series visualization
Created about 3 years ago · Last pushed 12 months ago
Metadata Files
Readme Contributing Funding License

README.md

tsdownsample

PyPI Latest Release support-version Downloads CodeQL Testing Testing Discord

Extremely fast time series downsampling 📈 for visualization, written in Rust.

Features ✨

  • Fast: written in rust with PyO3 bindings
    • leverages optimized argminmax - which is SIMD accelerated with runtime feature detection
    • scales linearly with the number of data points <!-- TODO check if it scales sublinearly -->
    • multithreaded with Rayon (in Rust)
      Why we do not use Python multiprocessing Citing the PyO3 docs on parallelism:
      CPython has the infamous Global Interpreter Lock, which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for CPU-bound tasks and often forces developers to accept the overhead of multiprocessing.
      In Rust - which is a compiled language - there is no GIL, so CPU-bound tasks can be parallelized (with Rayon) with little to no overhead.
  • Efficient: memory efficient
    • works on views of the data (no copies)
    • no intermediate data structures are created
  • Flexible: works on any type of data
    • supported datatypes are
    • for x: f32, f64, i16, i32, i64, u16, u32, u64, datetime64, timedelta64
    • for y: f16, f32, f64, i8, i16, i32, i64, u8, u16, u32, u64, datetime64, timedelta64, bool
      !! 🚀 f16 argminmax is 200-300x faster than numpy In contrast with all other data types above, f16 is not hardware supported (i.e., no instructions for f16) by most modern CPUs!!
      🐌 Programming languages facilitate support for this datatype by either (i) upcasting to f32 or (ii) using a software implementation.
      💡 As for argminmax, only comparisons are needed - and thus no arithmetic operations - creating a symmetrical ordinal mapping from f16 to i16 is sufficient. This mapping allows to use the hardware supported scalar and SIMD i16 instructions - while not producing any memory overhead 🎉
      More details are described in argminmax PR #1.
  • Easy to use: simple & flexible API

Install

bash pip install tsdownsample

Usage

```python from tsdownsample import MinMaxLTTBDownsampler import numpy as np

Create a time series

y = np.random.randn(10000000) x = np.arange(len(y))

Downsample to 1000 points (assuming constant sampling rate)

sds = MinMaxLTTBDownsampler().downsample(y, nout=1000)

Select downsampled data

downsampledy = y[sds]

Downsample to 1000 points using the (possible irregularly spaced) x-data

sds = MinMaxLTTBDownsampler().downsample(x, y, nout=1000)

Select downsampled data

downsampledx = x[sds] downsampledy = y[sds] ```

Downsampling algorithms & API

Downsampling API 📑

Each downsampling algorithm is implemented as a class that implements a downsample method. The signature of the downsample method:

downsample([x], y, n_out, **kwargs) -> ndarray[uint64]

Arguments:

  • x is optional
  • x and y are both positional arguments
  • n_out is a mandatory keyword argument that defines the number of output values*
  • **kwargs are optional keyword arguments (see table below):
    • parallel: whether to use multi-threading (default: False)
      ❗ The max number of threads can be configured with the TSDOWNSAMPLE_MAX_THREADS ENV var (e.g. os.environ["TSDOWNSAMPLE_MAX_THREADS"] = "4")
    • ...

Returns: a ndarray[uint64] of indices that can be used to index the original data.

*When there are gaps in the time series, fewer than n_out indices may be returned.

Downsampling algorithms 📈

The following downsampling algorithms (classes) are implemented:

| Downsampler | Description | **kwargs | | ---:| --- |--- | | MinMaxDownsampler | selects the min and max value in each bin | parallel | | M4Downsampler | selects the min, max, first and last value in each bin | parallel | | LTTBDownsampler | performs the Largest Triangle Three Buckets algorithm | parallel | | MinMaxLTTBDownsampler | (new two-step algorithm 🎉) first selects n_out * minmax_ratio min and max values, then further reduces these to n_out values using the Largest Triangle Three Buckets algorithm | parallel, minmax_ratio* |

*Default value for minmax_ratio is 4, which is empirically proven to be a good default. More details here: https://arxiv.org/abs/2305.00332

Handling NaNs

This library supports two NaN-policies:

  1. Omit NaNs (NaNs are ignored during downsampling).
  2. Return index of first NaN once there is at least one present in the bin of the considered data.

| Omit NaNs | Return NaNs | | ----------------------: | :------------------------- | | MinMaxDownsampler | NaNMinMaxDownsampler | | M4Downsampler | NaNM4Downsampler | | MinMaxLTTBDownsampler | NaNMinMaxLTTBDownsampler | | LTTBDownsampler | |

Note that NaNs are not supported for x-data.

Limitations & assumptions 🚨

Assumes;

  1. x-data is (non-strictly) monotonic increasing (i.e., sorted)
  2. no NaNs in x-data

👤 Jeroen Van Der Donckt

Owner

  • Name: PreDiCT.IDLab
  • Login: predict-idlab
  • Kind: organization
  • Location: Ghent - Belgium

Repositories of the IDLab PreDiCT research group

GitHub Events

Total
  • Create event: 7
  • Release event: 1
  • Issues event: 4
  • Watch event: 38
  • Delete event: 1
  • Issue comment event: 14
  • Push event: 31
  • Pull request review event: 2
  • Pull request event: 10
  • Fork event: 5
Last Year
  • Create event: 7
  • Release event: 1
  • Issues event: 4
  • Watch event: 38
  • Delete event: 1
  • Issue comment event: 14
  • Push event: 31
  • Pull request review event: 2
  • Pull request event: 10
  • Fork event: 5

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 22
  • Total Committers: 5
  • Avg Commits per committer: 4.4
  • Development Distribution Score (DDS): 0.545
Top Committers
Name Email Commits
Jeroen Van Der Donckt 1****d@u****m 10
Jeroen Van Der Donckt b****d@g****m 9
Saveliy Yusufov s****v@g****m 1
jayceslesar j****r@b****m 1
Jayce Slesar 4****r@u****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 27
  • Total pull requests: 64
  • Average time to close issues: 4 months
  • Average time to close pull requests: 11 days
  • Total issue authors: 11
  • Total pull request authors: 9
  • Average comments per issue: 2.41
  • Average comments per pull request: 1.11
  • Merged pull requests: 42
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 15
  • Average time to close issues: 3 months
  • Average time to close pull requests: 28 days
  • Issue authors: 1
  • Pull request authors: 5
  • Average comments per issue: 1.0
  • Average comments per pull request: 1.07
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jvdd (11)
  • jonasvdd (4)
  • jayceslesar (3)
  • lcs-crr (2)
  • NielsPraet (1)
  • my1e5 (1)
  • LeiRui (1)
  • daveah (1)
  • mike-iqmo (1)
  • Hoxbro (1)
Pull Request Authors
  • jvdd (48)
  • NielsPraet (6)
  • jonasvdd (4)
  • jayceslesar (3)
  • diliop (2)
  • my1e5 (2)
  • leviska (2)
  • TomaSajt (1)
  • smu160 (1)
Top Labels
Issue Labels
enhancement (8) bug (4) documentation (1) help wanted (1) unsure (1)
Pull Request Labels
enhancement (2)

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 488,926 last-month
  • Total docker downloads: 733
  • Total dependent packages: 3
    (may contain duplicates)
  • Total dependent repositories: 39
    (may contain duplicates)
  • Total versions: 23
  • Total maintainers: 2
pypi.org: tsdownsample

Time series downsampling in rust

  • Versions: 19
  • Dependent Packages: 3
  • Dependent Repositories: 39
  • Downloads: 488,926 Last month
  • Docker Downloads: 733
Rankings
Downloads: 0.6%
Dependent repos count: 2.3%
Dependent packages count: 3.1%
Docker downloads count: 3.8%
Average: 4.8%
Stargazers count: 7.1%
Forks count: 11.9%
Maintainers (2)
Last synced: 6 months ago
proxy.golang.org: github.com/predict-idlab/tsdownsample
  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.4%
Average: 6.6%
Dependent repos count: 6.8%
Last synced: 6 months ago

Dependencies

downsample_rs/Cargo.toml cargo
  • criterion 0.3.0 development
  • argminmax 0.2
  • half 2.1
  • ndarray 0.15.6
pyproject.toml pypi
  • numpy >=1.21
  • pandas >=1.3
  • python ^3.7.1
.github/workflows/ci-downsample_rs.yml actions
  • Swatinem/rust-cache v1 composite
  • actions-rs/toolchain v1 composite
  • actions/checkout v2 composite
.github/workflows/ci-tsdownsample.yml actions
  • PyO3/maturin-action v1 composite
  • Swatinem/rust-cache v2 composite
  • actions-rs/toolchain v1 composite
  • actions/checkout v3 composite
  • actions/download-artifact v3 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
  • codecov/codecov-action v3 composite
tests/requirements-linting.txt pypi
  • black * test
  • isort * test
  • mypy * test
  • ruff * test
tests/requirements.txt pypi
  • pytest * test
  • pytest-cov * test
.github/workflows/codeql.yml actions
  • actions/checkout v3 composite
  • github/codeql-action/analyze v2 composite
  • github/codeql-action/init v2 composite
.github/workflows/codspeed.yml actions
  • CodSpeedHQ/action v1 composite
  • Swatinem/rust-cache v2 composite
  • actions-rs/toolchain v1 composite
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
Cargo.toml cargo
downsample_rs/dev_utils/Cargo.toml cargo
notebooks/requirements.txt pypi
  • numpy *
  • pandas *
  • tsdownsample *