https://github.com/alan-turing-institute/paqarin

Python package for the generation and evaluation of synthetic time-series data.

https://github.com/alan-turing-institute/paqarin

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: acm.org
  • Committers with academic emails
    1 of 4 committers (25.0%) from academic institutions
  • Institutional organization owner
    Organization alan-turing-institute has institutional domain (turing.ac.uk)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.8%) to scientific vocabulary

Keywords

synthetic-data time-series
Last synced: 9 months ago · JSON representation

Repository

Python package for the generation and evaluation of synthetic time-series data.

Basic Info
  • Host: GitHub
  • Owner: alan-turing-institute
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 2.05 MB
Statistics
  • Stars: 1
  • Watchers: 4
  • Forks: 1
  • Open Issues: 1
  • Releases: 0
Topics
synthetic-data time-series
Created about 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Contributing License Codeowners

README.md

Paqarin

A library for the generation of synthetic time series data.

Installation

Paqarin was tested using Python 3.10. We strongly suggest you to create a virtual environment for running this code. For creating the .venv environment, do the following:

python -m venv .venv

Paqarin relies on AutoGluon for utility evaluation. As such, we expect you to have OpenMP installed in your system. Also, the installation of LightGBM might be problematic, if working on a MacBook with M1.

Once created, activate it and install the Paqarin package using the install_paqarin.bat script:

bash .\pyenv\Scripts\activate install_paqarin.bat <INDEX_URL> <PROVIDER_FLAGS>

Where <INDEX_URL> is the URL of the Python Package Index you want to use, and <PROVIDER_FLAGS> configure the provider libraries you want to install. Currently, we support the following libraries:

Installation can take several minutes. Depending on your connectivity, you might need several runs to have all the dependencies in place.

To verify the installation succeeded, you can try running one of our examples, using YData's implementation of the DoppleGANger algorithm:

bash cd examples python doppleganger_example.py

Usage

Paqarin exposes multiple synthetic time series generation algorithms, along with metrics to evaluate their performance. Using Paqarin, you can select which technique is better for your use case.

For example, to use the DoppleGanger algorithm, as implemented by ydata-synthetic, we do the following:

python doppleganger_generator: DoppleGangerGenerator = DoppleGangerGenerator( provider="ydata", generator_parameters=DoppleGanGerParameters( batch_size=512, learning_rate=0.001, latent_dimension=20, exponential_decay_rates=(0.2, 0.9), wgan_weight=2, packing_degree=1, epochs=100, sequence_length=56, sample_length=8, steps_per_batch=1, numerical_columns=["traffic_byte_counter", "ping_loss_rate"], measurement_columns=["traffic_byte_counter", "ping_loss_rate"], categorical_columns=["isp", "technology", "state"], filename="doppleganger_generator", ) )

Then, to calculate the predictive score, after training a forecasting model for multiple iterations, we can do:

```python evaluationpipeline: EvaluationPipeline = EvaluationPipeline( generatormap={"doppleganger": dopplegangergenerator}, scoring=PredictiveScorer( lstmunits=12, iterations=3, scorerepochs=100, scorerbatchsize=128, numberoffeatures=2, numericalcolumns=["trafficbytecounter", "pinglossrate"], sequencelength=56, metricvaluekey="meanabsolute_error") )

evaluationpipeline.fit(pd.readcsv("fcc_mba.csv")) ``` This will calculate the mean absolute error over multiple iterations, for both training over real data and using synthetic data. For additional details, please refer to our examples.

Maturity

Paqarin should be considered experimental. It comes with no support, but we are keen to receive feedback and suggestions on how to improve it. Paqarin is not meant to be used in production environments, and the risks of its deployment are unknown.

Owner

  • Name: The Alan Turing Institute
  • Login: alan-turing-institute
  • Kind: organization
  • Email: info@turing.ac.uk

The UK's national institute for data science and artificial intelligence.

GitHub Events

Total
  • Watch event: 2
  • Issue comment event: 1
  • Push event: 4
  • Pull request event: 2
Last Year
  • Watch event: 2
  • Issue comment event: 1
  • Push event: 4
  • Pull request event: 2

Committers

Last synced: 12 months ago

All Time
  • Total Commits: 12
  • Total Committers: 4
  • Avg Commits per committer: 3.0
  • Development Distribution Score (DDS): 0.417
Past Year
  • Commits: 4
  • Committers: 2
  • Avg Commits per committer: 2.0
  • Development Distribution Score (DDS): 0.25
Top Committers
Name Email Commits
Carlos Gavidia-Calderon c****n@t****k 7
andeElliott a****t@g****m 3
Peter Grantham p****m@h****m 1
45345266 c****n@n****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 12 months ago

All Time
  • Total issues: 1
  • Total pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 1.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 1.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • andeElliott (2)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

install-requirements.txt pypi
  • autogluon ==1.0.0
  • autogluon.common ==1.0.0
  • autogluon.core ==1.0.0
  • autogluon.features ==1.0.0
  • autogluon.tabular ==1.0.0
  • dask ==2023.12.0
  • tensorflow ==2.12.0
  • tensorflow-probability ==0.19.0
optional-requirements.txt pypi
  • black *
  • flake8 *
  • flake8-docstrings *
  • isort *
  • mypy *
  • notebook *
  • pandas-stubs *
  • pytest *
  • pytest-cov *
  • types-PyYAML *
  • types-Pygments *
  • types-colorama *
  • types-decorator *
  • types-jsonschema *
  • types-psutil *
  • types-pycurl *
  • types-setuptools *
  • types-six *
requirements.txt pypi
  • autogluon *
  • autogluon.tabular *
  • autogluon.tabular ==1.0.0
  • autogluon.timeseries *
  • dask *
  • gluonts *
  • lightning *
  • mlforecast *
  • networkx *
  • pyarrow *
  • pydantic ==1.10.13
  • scikit-learn *
  • sdv *
  • seaborn *
  • setuptools_scm *
  • statsforecast *
  • statsmodels *
  • synthcity *
  • tensorflow *
  • torch *
  • websockets *
  • ydata-synthetic *
setup.py pypi