fABBA

fABBA: A Python library for the fast symbolic approximation of time series - Published in JOSS (2024)

https://github.com/nla-group/fabba

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 12 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: arxiv.org, joss.theoj.org, zenodo.org
  • Committers with academic emails
    1 of 3 committers (33.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

dimensionality-reduction machine-learning symbolic-aggregate-approximation symbolic-representation time-series time-series-analysis time-series-classification time-series-clustering time-series-forecasting

Scientific Fields

Mathematics Computer Science - 84% confidence
Last synced: 6 months ago · JSON representation

Repository

A Python library for the fast symbolic approximation of time series

Basic Info
  • Host: GitHub
  • Owner: nla-group
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 550 MB
Statistics
  • Stars: 46
  • Watchers: 3
  • Forks: 11
  • Open Issues: 1
  • Releases: 15
Topics
dimensionality-reduction machine-learning symbolic-aggregate-approximation symbolic-representation time-series time-series-analysis time-series-classification time-series-clustering time-series-forecasting
Created over 4 years ago · Last pushed 9 months ago
Metadata Files
Readme License Code of conduct Citation

README.md

fABBA: An efficient symbolic aggregate approximation for temporal data

Build Status DOI !azure Documentation Status Download Status Static Badge !pypi DOI Anaconda-Server Badge

The ABBA methods provide a fast and accurate symbolic approximation of temporal data, making them well-suited for tasks such as compression, clustering, and classification. The fABBA library is a Python-based implementation designed to efficiently apply ABBA methods. It achieves this by first approximating a time series using a polygonal chain representation and then aggregating these polygonal segments into symbolic groups.

The fABBA library supports multiple ABBA variants, including the original ABBA method and the optimized fABBA approach. Unlike ABBA, fABBA accelerates the aggregation process by sorting polygonal pieces and leveraging early termination conditions, significantly improving computational efficiency. However, this speed-up comes at the cost of slightly reduced approximation accuracy compared to ABBA. A key distinction between fABBA and the ABBA method proposed by Elsworth and Gttel [Data Mining and Knowledge Discovery, 34:1175-1200, 2020] is that fABBA eliminates the need for repeated within-cluster-sum-of-squares computations, thereby reducing its overall computational complexity. Additionally, fABBA is fully tolerance-driven, meaning that users do not need to specify the number of symbols in advance, allowing for adaptive and flexible time series symbolization.

:rocket: Install

fABBA supports Linux, Windows, and MacOS operating system.

Anaconda-Server Badge

fABBA has the following essential dependencies for its functionality:

* cython (>= 0.29.7)
* numpy (>= 1.19.5)
* scipy (>=1.2.1)
* requests
* scikit-learn (>=0.17.1)
* threadpoolctl (>= 2.0.0)
* matplotlib

To ensure successful Cython compiling, please update your NumPy to the latest version>= 1.22.0.

To install the current release via PIP use:

pip install fabba

Download this repository:

git clone https://github.com/nla-group/fABBA.git

It also supports conda-forge install: Anaconda-Server Badge

To install this package via conda-forge, run the following: conda install -c conda-forge fabba

:checkered_flag: Examples

:star: Compress and reconstruct a time series

The following example approximately transforms a time series into a symbolic string representation (transform) and then converts the string back into a numerical format (inverse_transform). fABBA essentially requires two parameters tol and alpha. The tolerance tol determines how closely the polygonal chain approximation follows the original time series. The parameter alpha controls how similar time series pieces need to be in order to be represented by the same symbol. A smaller tol means that more polygonal pieces are used and the polygonal chain approximation is more accurate; but on the other hand, it will increase the length of the string representation. A smaller alpha typically results in a larger number of symbols.

The choice of parameters depends on the application, but in practice, one often just wants the polygonal chain to mimic the key features in time series and not to approximate any noise. In this example the time series is a sine wave and the chosen parameters result in the symbolic representation BbAaAaAaAaAaAaAaC. Note how the periodicity in the time series is nicely reflected in repetitions in its string representation.

```python import numpy as np import matplotlib.pyplot as plt from fABBA import fABBA

ts = [np.sin(0.05*i) for i in range(1000)] # original time series fabba = fABBA(tol=0.1, alpha=0.1, sorting='2-norm', scl=1, verbose=0)

string = fabba.fit_transform(ts) # string representation of the time series print(string) # prints aBbCbCbCbCbCbCbCA

inversets = fabba.inversetransform(string, ts[0]) # numerical time series reconstruction ```

Plot the time series and its polygonal chain reconstruction: python plt.plot(ts, label='time series') plt.plot(inverse_ts, label='reconstruction') plt.legend() plt.grid(True, axis='y') plt.show()

reconstruction

:star: Load paramters

One can load the parameters via: fabba.parameters, fabba.paramters.centers.

To play fABBA further with real datasets, we recommend users start with UCI Repository and UCR Archive.

:star: Adaptive polygonal chain approximation

Instead of using fit_transform which combines the polygonal chain approximation of the time series and the symbolic conversion into one, both steps of fABBA can be performed independently. Heres how to obtain the compression pieces and reconstruct time series by inversely transforming the pieces:

python import numpy as np from fABBA import compress from fABBA import inverse_compress ts = [np.sin(0.05*i) for i in range(1000)] pieces = compress(ts, tol=0.1) # pieces is a list of the polygonal chain pieces inverse_ts = inverse_compress(pieces, ts[0]) # reconstruct polygonal chain from pieces

Similarly, the digitization can be implemented after compression step as below:

```python from fABBA import digitize from fABBA import inverse_digitize string, parameters = digitize(pieces, alpha=0.1, sorting='2-norm', scl=1) # compression of the polygon print(''.join(string)) # prints aBbCbCbCbCbCbCbCA

inversepieces = inversedigitize(string, parameters) inversets = inversecompress(inverse_pieces, ts[0]) # numerical time series reconstruction ```

:star: Alternative ABBA approach

We also provide other clustering based ABBA methods, it is easy to use with the support of scikit-learn tools. The user guidance is as follows

```python import numpy as np from sklearn.cluster import KMeans from fABBA import ABBAbase

ts = [np.sin(0.05*i) for i in range(1000)] # original time series

specifies 5 symbols using kmeans clustering

kmeans = KMeans(nclusters=5, randomstate=0, init='k-means++', ninit='auto', verbose=0)
abba = ABBAbase(tol=0.1, scl=1, clustering=kmeans) string = abba.fit
transform(ts) # string representation of the time series print(string) # prints BbAaAaAaAaAaAaAaC inversets = abba.inversetransform(string) # reconstruction ```

fABBA is an extensive package, which includes all ABBA variants, you can use the original ABBA method via

python from fABBA import ABBA abba = ABBA(tol=0.1, scl=1, k=5, verbose=0) string = abba.fit_transform(ts) print(string) inverse_ts = abba.inverse_transform(string, ts[0])

:star: For multiple time series data transform

Load JABBA package and data:

Python from fABBA import JABBA from fABBA import loadData train, test = loadData()

Built in JABBA provide parameter of init for the specification of ABBA methods, if set agg, then it will automatically turn to fABBA method, and if set it to k-means, it will turn to ABBA method automatically. Use JABBA object to fit and symbolize the train set via API fit_transform, and reconstruct the time series from the symbolic representation simply by Python jabba = JABBA(tol=0.0005, init='agg', verbose=1) symbols = jabba.fit_transform(train) reconst = jabba.inverse_transform(symbols)

Note: function loadData() is a lightweight API for time series dataset loading, which only supports part of data in UEA or UCR Archive, please refer to the document for full use detail. JABBA is used to process multiple time series as well as multivariate time series, so the input should be ensured to be 2-dimensional, for example, when loading the UCI dataset, e.g., Beef, use symbols = jabba.fit_transform(train), when loading UEA dataset, e.g., BasicMotions, use symbols = jabba.fit_transform(train[0]). For details, we refer to (https://www.cs.ucr.edu/~eamonn/timeseriesdata_2018/).

For the out-of-sample data, use the function transform to symbolize the test time series, and reconstruct the symbolization via function inverse_transform, the code illustration is as follows: Python test_symbols, start_set = jabba.transform(test) # if UEA time series is used, simply use instead qabba.transform(test[0]) test_reconst = jabba.inverse_transform(test_symbols, start_set)

:star: For symbolic approximation with quantized ABBA

Load QABBA package and data:

Python from fABBA import QABBA from fABBA import loadData train, test = loadData()

Built in QABBA provide parameter of init for the specification of ABBA methods, if set agg, then it will automatically turn to fABBA method, and if set it to k-means, it will turn to ABBA method automatically. Use QABBA object to fit and symbolize the train set via API fit_transform, and reconstruct the time series from the symbolic representation simply by Python qabba = QABBA(tol=0.0005, init='agg', verbose=1, bits_for_len=8, bits_for_inc=12) symbols = qabba.fit_transform(train) reconst = qabba.inverse_transform(symbols)

For the out-of-sample data, use the function transform to symbolize the test time series, and reconstruct the symbolization via function inverse_transform, the code illustration is as follows: Python test_symbols, start_set = qabba.transform(test) # if UEA time series is used, simply use instead jabba.transform(test[0]) test_reconst = qabba.inverse_transform(test_symbols, start_set)

:star: For symbolic approximation with fixed point ABBA

Load XABBA package and data:

Python from fABBA import XABBA from fABBA import loadData train, test = loadData()

XABBA follows the same routine as above.

Python abba = XABBA(tol=0.0005, init='agg', verbose=1, bits_for_len=8, bits_for_inc=12) symbols = abba.fit_transform(train) reconst = abba.inverse_transform(symbols)

:star: Image compression

The following example shows how to apply fABBA to image data.

```python import matplotlib.pyplot as plt from fABBA.loaddatasets import loadimages from fABBA import imagecompress from fABBA import imagedecompress from fABBA import fABBA from cv2 import resize imgsamples = loadimages() # load test images img = resize(img_samples[0], (100, 100)) # select the first image for test

fabba = fABBA(tol=0.1, alpha=0.01, sorting='2-norm', scl=1, verbose=1) string = imagecompress(fabba, img) inverseimg = image_decompress(fabba, string) ```

Plot the original image: python plt.imshow(img) plt.show()

original image

Plot the reconstructed image: python plt.imshow(inverse_img) plt.show()

reconstruction

:art: Experiments

The folder "exp" contains all code required to reproduce the experiments in the manuscript "An efficient aggregation method for the symbolic representation of temporal data".

Some of the experiments also require the UCR Archive 2018 datasets which can be downloaded from UCR Time Series Classification Archive.

There are a number of dependencies listed below. Most of these modules, except perhaps the final ones, are part of any standard Python installation. We list them for completeness:

os, csv, time, pickle, numpy, warnings, matplotlib, math, collections, copy, sklearn, pandas, tqdm, tslearn

Please ensure that these modules are available before running the codes. A numpy version newer than 1.19.0 and less than 1.20 is required.

It is necessary to compile the Cython files in the experiments folder (though this is already compiled in the main module, the experiments code is separated). To compile the Cython extension in "src" use: cd exp/src python3 setup.py build_ext --inplace or cd exp/src python setup.py build_ext --inplace

:love_letter: Others

We also provide C++ implementation for fABBA in the repository cabba, it would be nice to give a shot!

Run example:

git clone https://github.com/nla-group/fABBA.git cd fABBA/cpp g++ -o test runtime.cpp ./test

:paperclip: Citation

If you use fABBA software for your benchmarking, we would appreciate your citing: bibtex @article{Chen2024, doi = {10.21105/joss.06294}, url = {https://doi.org/10.21105/joss.06294}, year = {2024}, publisher = {The Open Journal}, volume = {9}, number = {95}, pages = {6294}, author = {Xinye Chen and Stefan Gttel}, title = {fABBA: A Python library for the fast symbolic approximation of time series}, journal = {Journal of Open Source Software} }

If you use fABBA meethod in a scientific publication, we would appreciate your citing:

bibtex @article{10.1145/3532622, author = {Chen, Xinye and G\"{u}ttel, Stefan}, title = {An Efficient Aggregation Method for the Symbolic Representation of Temporal Data}, year = {2023}, publisher = {ACM}, volume = {17}, number = {1}, doi = {10.1145/3532622}, journal = {ACM Transactions on Knowledge Discovery from Data}, numpages = {22}, }

If you use QABBA method in a scientific publication, we would appreociate your citing:

bibtex @misc{2411.15209, title={Quantized symbolic time series approximation}, author={Erin Carson and Xinye Chen and Cheng Kang}, year={2025}, eprint={2411.15209}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2411.15209}, }

If you have any questions, please be free to reach us!

License

This project is licensed under the terms of the License.

Owner

  • Name: nla-group
  • Login: nla-group
  • Kind: organization

JOSS Publication

fABBA: A Python library for the fast symbolic approximation of time series
Published
March 30, 2024
Volume 9, Issue 95, Page 6294
Authors
Xinye Chen ORCID
Department of Numerical Mathematics, Charles University Prague, Czech Republic
Stefan Güttel ORCID
Department of Mathematics, The University of Manchester, United Kingdom
Editor
Oskar Laverny ORCID
Tags
time series dimensionality reduction symbolic representation data science

GitHub Events

Total
  • Issues event: 1
  • Watch event: 6
  • Issue comment event: 2
  • Push event: 33
  • Fork event: 2
  • Create event: 1
Last Year
  • Issues event: 1
  • Watch event: 6
  • Issue comment event: 2
  • Push event: 33
  • Fork event: 2
  • Create event: 1

Committers

Last synced: 11 months ago

All Time
  • Total Commits: 523
  • Total Committers: 3
  • Avg Commits per committer: 174.333
  • Development Distribution Score (DDS): 0.017
Past Year
  • Commits: 68
  • Committers: 1
  • Avg Commits per committer: 68.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Null 4****e 514
Stefan Güttel g****l 8
Oskar Laverny o****y@u****r 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 5
  • Total pull requests: 1
  • Average time to close issues: 29 days
  • Average time to close pull requests: less than a minute
  • Total issue authors: 5
  • Total pull request authors: 1
  • Average comments per issue: 2.2
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 3.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Lycher2 (1)
  • nsankar (1)
  • whyupupup (1)
  • lrnv (1)
  • allie-tatarian (1)
Pull Request Authors
  • chenxinye (1)
  • lrnv (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 183 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 1
    (may contain duplicates)
  • Total versions: 127
  • Total maintainers: 1
pypi.org: fabba

An efficient aggregation method for the symbolic representation of temporal data

  • Versions: 124
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 183 Last month
Rankings
Downloads: 9.1%
Dependent packages count: 10.1%
Stargazers count: 12.3%
Average: 14.0%
Forks count: 16.9%
Dependent repos count: 21.5%
Maintainers (1)
Last synced: 6 months ago
conda-forge.org: fabba
  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 34.0%
Stargazers count: 44.6%
Average: 46.0%
Dependent packages count: 51.2%
Forks count: 54.2%
Last synced: 6 months ago

Dependencies

.github/workflows/draft-pdf.yml actions
  • actions/checkout v2 composite
  • actions/upload-artifact v1 composite
  • openjournals/openjournals-draft-action master composite
doc/requirements.txt pypi
  • sphinx_rtd_theme *
exp/src/setup.py pypi
fABBA.egg-info/requires.txt pypi
  • joblib >=1.1.1
  • matplotlib *
  • numpy >=1.3.0
  • pandas *
  • requests *
  • scikit-learn *
  • scipy >=0.7.0
requirements.txt pypi
  • cython *
  • numpy *
  • pandas *
  • requests *
  • scipy >1.2.1
setup.py pypi