fABBA

fABBA: A Python library for the fast symbolic approximation of time series - Published in JOSS (2024)

Keywords

dimensionality-reduction machine-learning symbolic-aggregate-approximation symbolic-representation time-series time-series-analysis time-series-classification time-series-clustering time-series-forecasting

Scientific Fields

Mathematics Computer Science - 84% confidence

Last synced: 10 months ago · JSON representation

Repository

A Python library for the fast symbolic approximation of time series

Basic Info

Host: GitHub
Owner: nla-group
License: bsd-3-clause
Language: Python
Default Branch: master
Homepage:
Size: 550 MB

Statistics

Stars: 46
Watchers: 3
Forks: 11
Open Issues: 1
Releases: 15

Topics

dimensionality-reduction machine-learning symbolic-aggregate-approximation symbolic-representation time-series time-series-analysis time-series-classification time-series-clustering time-series-forecasting

Created almost 5 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Code of conduct Citation

README.md

fABBA: An efficient symbolic aggregate approximation for temporal data

The ABBA methods provide a fast and accurate symbolic approximation of temporal data, making them well-suited for tasks such as compression, clustering, and classification. The fABBA library is a Python-based implementation designed to efficiently apply ABBA methods. It achieves this by first approximating a time series using a polygonal chain representation and then aggregating these polygonal segments into symbolic groups.

The fABBA library supports multiple ABBA variants, including the original ABBA method and the optimized fABBA approach. Unlike ABBA, fABBA accelerates the aggregation process by sorting polygonal pieces and leveraging early termination conditions, significantly improving computational efficiency. However, this speed-up comes at the cost of slightly reduced approximation accuracy compared to ABBA. A key distinction between fABBA and the ABBA method proposed by Elsworth and Gttel [Data Mining and Knowledge Discovery, 34:1175-1200, 2020] is that fABBA eliminates the need for repeated within-cluster-sum-of-squares computations, thereby reducing its overall computational complexity. Additionally, fABBA is fully tolerance-driven, meaning that users do not need to specify the number of symbols in advance, allowing for adaptive and flexible time series symbolization.

:rocket: Install

fABBA supports Linux, Windows, and MacOS operating system.

fABBA has the following essential dependencies for its functionality:

* cython (>= 0.29.7)
* numpy (>= 1.19.5)
* scipy (>=1.2.1)
* requests
* scikit-learn (>=0.17.1)
* threadpoolctl (>= 2.0.0)
* matplotlib

To ensure successful Cython compiling, please update your NumPy to the latest version>= 1.22.0.

To install the current release via PIP use:

pip install fabba

Download this repository:

git clone https://github.com/nla-group/fABBA.git

It also supports conda-forge install:

To install this package via conda-forge, run the following: conda install -c conda-forge fabba

:checkered_flag: Examples

:star: Compress and reconstruct a time series

The following example approximately transforms a time series into a symbolic string representation (transform) and then converts the string back into a numerical format (inverse_transform). fABBA essentially requires two parameters tol and alpha. The tolerance tol determines how closely the polygonal chain approximation follows the original time series. The parameter alpha controls how similar time series pieces need to be in order to be represented by the same symbol. A smaller tol means that more polygonal pieces are used and the polygonal chain approximation is more accurate; but on the other hand, it will increase the length of the string representation. A smaller alpha typically results in a larger number of symbols.

The choice of parameters depends on the application, but in practice, one often just wants the polygonal chain to mimic the key features in time series and not to approximate any noise. In this example the time series is a sine wave and the chosen parameters result in the symbolic representation BbAaAaAaAaAaAaAaC. Note how the periodicity in the time series is nicely reflected in repetitions in its string representation.

```python import numpy as np import matplotlib.pyplot as plt from fABBA import fABBA

ts = [np.sin(0.05*i) for i in range(1000)] # original time series fabba = fABBA(tol=0.1, alpha=0.1, sorting='2-norm', scl=1, verbose=0)

string = fabba.fit_transform(ts) # string representation of the time series print(string) # prints aBbCbCbCbCbCbCbCA

inversets = fabba.inversetransform(string, ts[0]) # numerical time series reconstruction ```

Plot the time series and its polygonal chain reconstruction: python plt.plot(ts, label='time series') plt.plot(inverse_ts, label='reconstruction') plt.legend() plt.grid(True, axis='y') plt.show()

reconstruction

:star: Load paramters

One can load the parameters via: fabba.parameters, fabba.paramters.centers.

To play fABBA further with real datasets, we recommend users start with UCI Repository and UCR Archive.

:star: Adaptive polygonal chain approximation

Instead of using fit_transform which combines the polygonal chain approximation of the time series and the symbolic conversion into one, both steps of fABBA can be performed independently. Heres how to obtain the compression pieces and reconstruct time series by inversely transforming the pieces:

python import numpy as np from fABBA import compress from fABBA import inverse_compress ts = [np.sin(0.05*i) for i in range(1000)] pieces = compress(ts, tol=0.1) # pieces is a list of the polygonal chain pieces inverse_ts = inverse_compress(pieces, ts[0]) # reconstruct polygonal chain from pieces

Similarly, the digitization can be implemented after compression step as below:

```python from fABBA import digitize from fABBA import inverse_digitize string, parameters = digitize(pieces, alpha=0.1, sorting='2-norm', scl=1) # compression of the polygon print(''.join(string)) # prints aBbCbCbCbCbCbCbCA

inversepieces = inversedigitize(string, parameters) inversets = inversecompress(inverse_pieces, ts[0]) # numerical time series reconstruction ```

:star: Alternative ABBA approach

We also provide other clustering based ABBA methods, it is easy to use with the support of scikit-learn tools. The user guidance is as follows

```python import numpy as np from sklearn.cluster import KMeans from fABBA import ABBAbase

ts = [np.sin(0.05*i) for i in range(1000)] # original time series

specifies 5 symbols using kmeans clustering

kmeans = KMeans(nclusters=5, randomstate=0, init='k-means++', ninit='auto', verbose=0)
abba = ABBAbase(tol=0.1, scl=1, clustering=kmeans) string = abba.fittransform(ts) # string representation of the time series print(string) # prints BbAaAaAaAaAaAaAaC inversets = abba.inversetransform(string) # reconstruction ```

fABBA is an extensive package, which includes all ABBA variants, you can use the original ABBA method via

python from fABBA import ABBA abba = ABBA(tol=0.1, scl=1, k=5, verbose=0) string = abba.fit_transform(ts) print(string) inverse_ts = abba.inverse_transform(string, ts[0])

:star: For multiple time series data transform

Load JABBA package and data:

Python from fABBA import JABBA from fABBA import loadData train, test = loadData()

Built in JABBA provide parameter of init for the specification of ABBA methods, if set agg, then it will automatically turn to fABBA method, and if set it to k-means, it will turn to ABBA method automatically. Use JABBA object to fit and symbolize the train set via API fit_transform, and reconstruct the time series from the symbolic representation simply by Python jabba = JABBA(tol=0.0005, init='agg', verbose=1) symbols = jabba.fit_transform(train) reconst = jabba.inverse_transform(symbols)

Note: function loadData() is a lightweight API for time series dataset loading, which only supports part of data in UEA or UCR Archive, please refer to the document for full use detail. JABBA is used to process multiple time series as well as multivariate time series, so the input should be ensured to be 2-dimensional, for example, when loading the UCI dataset, e.g., Beef, use symbols = jabba.fit_transform(train), when loading UEA dataset, e.g., BasicMotions, use symbols = jabba.fit_transform(train[0]). For details, we refer to (https://www.cs.ucr.edu/~eamonn/timeseriesdata_2018/).

For the out-of-sample data, use the function transform to symbolize the test time series, and reconstruct the symbolization via function inverse_transform, the code illustration is as follows: Python test_symbols, start_set = jabba.transform(test) # if UEA time series is used, simply use instead qabba.transform(test[0]) test_reconst = jabba.inverse_transform(test_symbols, start_set)

:star: For symbolic approximation with quantized ABBA

Load QABBA package and data:

Python from fABBA import QABBA from fABBA import loadData train, test = loadData()

Built in QABBA provide parameter of init for the specification of ABBA methods, if set agg, then it will automatically turn to fABBA method, and if set it to k-means, it will turn to ABBA method automatically. Use QABBA object to fit and symbolize the train set via API fit_transform, and reconstruct the time series from the symbolic representation simply by Python qabba = QABBA(tol=0.0005, init='agg', verbose=1, bits_for_len=8, bits_for_inc=12) symbols = qabba.fit_transform(train) reconst = qabba.inverse_transform(symbols)

For the out-of-sample data, use the function transform to symbolize the test time series, and reconstruct the symbolization via function inverse_transform, the code illustration is as follows: Python test_symbols, start_set = qabba.transform(test) # if UEA time series is used, simply use instead jabba.transform(test[0]) test_reconst = qabba.inverse_transform(test_symbols, start_set)

:star: For symbolic approximation with fixed point ABBA

Load XABBA package and data:

Python from fABBA import XABBA from fABBA import loadData train, test = loadData()

XABBA follows the same routine as above.

Python abba = XABBA(tol=0.0005, init='agg', verbose=1, bits_for_len=8, bits_for_inc=12) symbols = abba.fit_transform(train) reconst = abba.inverse_transform(symbols)

:star: Image compression

The following example shows how to apply fABBA to image data.

```python import matplotlib.pyplot as plt from fABBA.loaddatasets import loadimages from fABBA import imagecompress from fABBA import imagedecompress from fABBA import fABBA from cv2 import resize imgsamples = loadimages() # load test images img = resize(img_samples[0], (100, 100)) # select the first image for test

fabba = fABBA(tol=0.1, alpha=0.01, sorting='2-norm', scl=1, verbose=1) string = imagecompress(fabba, img) inverseimg = image_decompress(fabba, string) ```

Plot the original image: python plt.imshow(img) plt.show()

original image

Plot the reconstructed image: python plt.imshow(inverse_img) plt.show()

reconstruction

:art: Experiments

The folder "exp" contains all code required to reproduce the experiments in the manuscript "An efficient aggregation method for the symbolic representation of temporal data".

Some of the experiments also require the UCR Archive 2018 datasets which can be downloaded from UCR Time Series Classification Archive.

There are a number of dependencies listed below. Most of these modules, except perhaps the final ones, are part of any standard Python installation. We list them for completeness:

os, csv, time, pickle, numpy, warnings, matplotlib, math, collections, copy, sklearn, pandas, tqdm, tslearn

Please ensure that these modules are available before running the codes. A numpy version newer than 1.19.0 and less than 1.20 is required.

It is necessary to compile the Cython files in the experiments folder (though this is already compiled in the main module, the experiments code is separated). To compile the Cython extension in "src" use: cd exp/src python3 setup.py build_ext --inplace or cd exp/src python setup.py build_ext --inplace

:love_letter: Others

We also provide C++ implementation for fABBA in the repository cabba, it would be nice to give a shot!

Run example:

git clone https://github.com/nla-group/fABBA.git cd fABBA/cpp g++ -o test runtime.cpp ./test

:paperclip: Citation

If you use fABBA software for your benchmarking, we would appreciate your citing: bibtex @article{Chen2024, doi = {10.21105/joss.06294}, url = {https://doi.org/10.21105/joss.06294}, year = {2024}, publisher = {The Open Journal}, volume = {9}, number = {95}, pages = {6294}, author = {Xinye Chen and Stefan Gttel}, title = {fABBA: A Python library for the fast symbolic approximation of time series}, journal = {Journal of Open Source Software} }

If you use fABBA meethod in a scientific publication, we would appreciate your citing:

bibtex @article{10.1145/3532622, author = {Chen, Xinye and G\"{u}ttel, Stefan}, title = {An Efficient Aggregation Method for the Symbolic Representation of Temporal Data}, year = {2023}, publisher = {ACM}, volume = {17}, number = {1}, doi = {10.1145/3532622}, journal = {ACM Transactions on Knowledge Discovery from Data}, numpages = {22}, }

If you use QABBA method in a scientific publication, we would appreociate your citing:

bibtex @misc{2411.15209, title={Quantized symbolic time series approximation}, author={Erin Carson and Xinye Chen and Cheng Kang}, year={2025}, eprint={2411.15209}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2411.15209}, }

If you have any questions, please be free to reach us!

License

This project is licensed under the terms of the .

Owner

Name: nla-group
Login: nla-group
Kind: organization

Repositories: 8
Profile: https://github.com/nla-group

JOSS Publication

fABBA: A Python library for the fast symbolic approximation of time series

Published

March 30, 2024

DOI

10.21105/joss.06294

Volume 9, Issue 95, Page 6294

Authors

Xinye Chen

Department of Numerical Mathematics, Charles University Prague, Czech Republic

Stefan Güttel

Department of Mathematics, The University of Manchester, United Kingdom

Editor

Oskar Laverny

GitHub Events

Total

Issues event: 1
Watch event: 6
Issue comment event: 2
Push event: 33
Fork event: 2
Create event: 1

Last Year

Issues event: 1
Watch event: 6
Issue comment event: 2
Push event: 33
Fork event: 2
Create event: 1

Committers

Last synced: about 1 year ago

All Time

Total Commits: 523
Total Committers: 3
Avg Commits per committer: 174.333
Development Distribution Score (DDS): 0.017

Past Year

Commits: 68
Committers: 1
Avg Commits per committer: 68.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Null	4****e	514
Stefan Güttel	g****l	8
Oskar Laverny	o**y@u**r	1

Committer Domains (Top 20 + Academic)

univ-amu.fr: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 5
Total pull requests: 1
Average time to close issues: 29 days
Average time to close pull requests: less than a minute
Total issue authors: 5
Total pull request authors: 1
Average comments per issue: 2.2
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 3.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Lycher2 (1)
nsankar (1)
whyupupup (1)
lrnv (1)
allie-tatarian (1)

Pull Request Authors

chenxinye (1)
lrnv (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 2
Total downloads:
- pypi 183 last-month

Total dependent packages: 0
(may contain duplicates)
Total dependent repositories: 1
(may contain duplicates)
Total versions: 127
Total maintainers: 1

pypi.org: fabba

An efficient aggregation method for the symbolic representation of temporal data

Homepage: https://github.com/nla-group/fABBA
Documentation: https://fabba.readthedocs.io/
License: BSD 3-Clause
Latest release: 1.3.1
published about 1 year ago

Versions: 124
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 183 Last month

Rankings

Downloads: 9.1%

Dependent packages count: 10.1%

Stargazers count: 12.3%

Average: 14.0%

Forks count: 16.9%

Dependent repos count: 21.5%

Maintainers (1)

Stefan_Xinye.NLA3

Last synced: 10 months ago

conda-forge.org: fabba

Homepage: https://github.com/nla-group/fABBA
License: BSD-3-Clause
Latest release: 1.0.7
published over 3 years ago

Versions: 3
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent repos count: 34.0%

Stargazers count: 44.6%

Average: 46.0%

Dependent packages count: 51.2%

Forks count: 54.2%

Last synced: 10 months ago

fABBA

Science Score: 95.0%

Keywords

Scientific Fields

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

fABBA: An efficient symbolic aggregate approximation for temporal data

:rocket: Install

To ensure successful Cython compiling, please update your NumPy to the latest version>= 1.22.0.

:checkered_flag: Examples

:star: Compress and reconstruct a time series

:star: Load paramters

:star: Adaptive polygonal chain approximation

:star: Alternative ABBA approach

specifies 5 symbols using kmeans clustering

:star: For multiple time series data transform

:star: For symbolic approximation with quantized ABBA

:star: For symbolic approximation with fixed point ABBA

:star: Image compression

:art: Experiments

:love_letter: Others

:paperclip: Citation

If you have any questions, please be free to reach us!

License

Owner

JOSS Publication

fABBA: A Python library for the fast symbolic approximation of time series

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: fabba

Rankings

Maintainers (1)

conda-forge.org: fabba

Rankings

Dependencies