https://github.com/predict-idlab/tsdownsample
High-performance time series downsampling algorithms for visualization
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, acm.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.4%) to scientific vocabulary
Keywords
Repository
High-performance time series downsampling algorithms for visualization
Basic Info
Statistics
- Stars: 193
- Watchers: 9
- Forks: 18
- Open Issues: 17
- Releases: 5
Topics
Metadata Files
README.md
tsdownsample
Extremely fast time series downsampling 📈 for visualization, written in Rust.
Features ✨
- Fast: written in rust with PyO3 bindings
- leverages optimized argminmax - which is SIMD accelerated with runtime feature detection
- scales linearly with the number of data points <!-- TODO check if it scales sublinearly -->
- multithreaded with Rayon (in Rust)
Why we do not use Python multiprocessing
Citing the PyO3 docs on parallelism:
CPython has the infamous Global Interpreter Lock, which prevents several threads from executing Python bytecode in parallel. This makes threading in Python a bad fit for CPU-bound tasks and often forces developers to accept the overhead of multiprocessing.
In Rust - which is a compiled language - there is no GIL, so CPU-bound tasks can be parallelized (with Rayon) with little to no overhead.
- Efficient: memory efficient
- works on views of the data (no copies)
- no intermediate data structures are created
- Flexible: works on any type of data
- supported datatypes are
- for
x:f32,f64,i16,i32,i64,u16,u32,u64,datetime64,timedelta64 - for
y:f16,f32,f64,i8,i16,i32,i64,u8,u16,u32,u64,datetime64,timedelta64,bool!! 🚀
In contrast with all other data types above,f16argminmax is 200-300x faster than numpyf16is not hardware supported (i.e., no instructions for f16) by most modern CPUs!!
🐌 Programming languages facilitate support for this datatype by either (i) upcasting to f32 or (ii) using a software implementation.
💡 As for argminmax, only comparisons are needed - and thus no arithmetic operations - creating a symmetrical ordinal mapping fromf16toi16is sufficient. This mapping allows to use the hardware supported scalar and SIMDi16instructions - while not producing any memory overhead 🎉
More details are described in argminmax PR #1.
- Easy to use: simple & flexible API
Install
bash
pip install tsdownsample
Usage
```python from tsdownsample import MinMaxLTTBDownsampler import numpy as np
Create a time series
y = np.random.randn(10000000) x = np.arange(len(y))
Downsample to 1000 points (assuming constant sampling rate)
sds = MinMaxLTTBDownsampler().downsample(y, nout=1000)
Select downsampled data
downsampledy = y[sds]
Downsample to 1000 points using the (possible irregularly spaced) x-data
sds = MinMaxLTTBDownsampler().downsample(x, y, nout=1000)
Select downsampled data
downsampledx = x[sds] downsampledy = y[sds] ```
Downsampling algorithms & API
Downsampling API 📑
Each downsampling algorithm is implemented as a class that implements a downsample method.
The signature of the downsample method:
downsample([x], y, n_out, **kwargs) -> ndarray[uint64]
Arguments:
xis optionalxandyare both positional argumentsn_outis a mandatory keyword argument that defines the number of output values***kwargsare optional keyword arguments (see table below):parallel: whether to use multi-threading (default:False)
❗ The max number of threads can be configured with theTSDOWNSAMPLE_MAX_THREADSENV var (e.g.os.environ["TSDOWNSAMPLE_MAX_THREADS"] = "4")- ...
Returns: a ndarray[uint64] of indices that can be used to index the original data.
*When there are gaps in the time series, fewer than n_out indices may be returned.
Downsampling algorithms 📈
The following downsampling algorithms (classes) are implemented:
| Downsampler | Description | **kwargs |
| ---:| --- |--- |
| MinMaxDownsampler | selects the min and max value in each bin | parallel |
| M4Downsampler | selects the min, max, first and last value in each bin | parallel |
| LTTBDownsampler | performs the Largest Triangle Three Buckets algorithm | parallel |
| MinMaxLTTBDownsampler | (new two-step algorithm 🎉) first selects n_out * minmax_ratio min and max values, then further reduces these to n_out values using the Largest Triangle Three Buckets algorithm | parallel, minmax_ratio* |
*Default value for minmax_ratio is 4, which is empirically proven to be a good default. More details here: https://arxiv.org/abs/2305.00332
Handling NaNs
This library supports two NaN-policies:
- Omit
NaNs (NaNs are ignored during downsampling). - Return index of first
NaNonce there is at least one present in the bin of the considered data.
| Omit NaNs | Return NaNs |
| ----------------------: | :------------------------- |
| MinMaxDownsampler | NaNMinMaxDownsampler |
| M4Downsampler | NaNM4Downsampler |
| MinMaxLTTBDownsampler | NaNMinMaxLTTBDownsampler |
| LTTBDownsampler | |
Note that NaNs are not supported for
x-data.
Limitations & assumptions 🚨
Assumes;
x-data is (non-strictly) monotonic increasing (i.e., sorted)- no
NaNs inx-data
👤 Jeroen Van Der Donckt
Owner
- Name: PreDiCT.IDLab
- Login: predict-idlab
- Kind: organization
- Location: Ghent - Belgium
- Website: http://predict.idlab.ugent.be/
- Repositories: 55
- Profile: https://github.com/predict-idlab
Repositories of the IDLab PreDiCT research group
GitHub Events
Total
- Create event: 7
- Release event: 1
- Issues event: 4
- Watch event: 38
- Delete event: 1
- Issue comment event: 14
- Push event: 31
- Pull request review event: 2
- Pull request event: 10
- Fork event: 5
Last Year
- Create event: 7
- Release event: 1
- Issues event: 4
- Watch event: 38
- Delete event: 1
- Issue comment event: 14
- Push event: 31
- Pull request review event: 2
- Pull request event: 10
- Fork event: 5
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 22
- Total Committers: 5
- Avg Commits per committer: 4.4
- Development Distribution Score (DDS): 0.545
Top Committers
| Name | Commits | |
|---|---|---|
| Jeroen Van Der Donckt | 1****d@u****m | 10 |
| Jeroen Van Der Donckt | b****d@g****m | 9 |
| Saveliy Yusufov | s****v@g****m | 1 |
| jayceslesar | j****r@b****m | 1 |
| Jayce Slesar | 4****r@u****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 27
- Total pull requests: 64
- Average time to close issues: 4 months
- Average time to close pull requests: 11 days
- Total issue authors: 11
- Total pull request authors: 9
- Average comments per issue: 2.41
- Average comments per pull request: 1.11
- Merged pull requests: 42
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 15
- Average time to close issues: 3 months
- Average time to close pull requests: 28 days
- Issue authors: 1
- Pull request authors: 5
- Average comments per issue: 1.0
- Average comments per pull request: 1.07
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- jvdd (11)
- jonasvdd (4)
- jayceslesar (3)
- lcs-crr (2)
- NielsPraet (1)
- my1e5 (1)
- LeiRui (1)
- daveah (1)
- mike-iqmo (1)
- Hoxbro (1)
Pull Request Authors
- jvdd (48)
- NielsPraet (6)
- jonasvdd (4)
- jayceslesar (3)
- diliop (2)
- my1e5 (2)
- leviska (2)
- TomaSajt (1)
- smu160 (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- pypi 488,926 last-month
- Total docker downloads: 733
-
Total dependent packages: 3
(may contain duplicates) -
Total dependent repositories: 39
(may contain duplicates) - Total versions: 23
- Total maintainers: 2
pypi.org: tsdownsample
Time series downsampling in rust
- Homepage: https://github.com/predict-idlab/tsdownsample
- Documentation: https://tsdownsample.readthedocs.io/
- License: MIT
-
Latest release: 0.1.4
published about 3 years ago
Rankings
proxy.golang.org: github.com/predict-idlab/tsdownsample
- Documentation: https://pkg.go.dev/github.com/predict-idlab/tsdownsample#section-documentation
- License: mit
-
Latest release: v0.1.3
published about 2 years ago
Rankings
Dependencies
- criterion 0.3.0 development
- argminmax 0.2
- half 2.1
- ndarray 0.15.6
- numpy >=1.21
- pandas >=1.3
- python ^3.7.1
- Swatinem/rust-cache v1 composite
- actions-rs/toolchain v1 composite
- actions/checkout v2 composite
- PyO3/maturin-action v1 composite
- Swatinem/rust-cache v2 composite
- actions-rs/toolchain v1 composite
- actions/checkout v3 composite
- actions/download-artifact v3 composite
- actions/setup-python v4 composite
- actions/upload-artifact v3 composite
- codecov/codecov-action v3 composite
- black * test
- isort * test
- mypy * test
- ruff * test
- pytest * test
- pytest-cov * test
- actions/checkout v3 composite
- github/codeql-action/analyze v2 composite
- github/codeql-action/init v2 composite
- CodSpeedHQ/action v1 composite
- Swatinem/rust-cache v2 composite
- actions-rs/toolchain v1 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- numpy *
- pandas *
- tsdownsample *