https://github.com/sdv-dev/deepecho

Synthetic Data Generation for mixed-type, multivariate time series.

https://github.com/sdv-dev/deepecho

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 12 committers (8.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.0%) to scientific vocabulary

Keywords

data-generation deep-learning generative-adversarial-network sdv synthetic-data synthetic-data-generation time-series

Keywords from Contributors

gan generative-ai generative-model generativeai multi-table relational-datasets
Last synced: 6 months ago · JSON representation

Repository

Synthetic Data Generation for mixed-type, multivariate time series.

Basic Info
  • Host: GitHub
  • Owner: sdv-dev
  • License: other
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 767 KB
Statistics
  • Stars: 116
  • Watchers: 8
  • Forks: 16
  • Open Issues: 6
  • Releases: 17
Topics
data-generation deep-learning generative-adversarial-network sdv synthetic-data synthetic-data-generation time-series
Created over 5 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing License Codeowners Authors

README.md


This repository is part of The Synthetic Data Vault Project, a project from DataCebo.

[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha) [![PyPi Shield](https://img.shields.io/pypi/v/deepecho.svg)](https://pypi.python.org/pypi/deepecho) [![Tests](https://github.com/sdv-dev/DeepEcho/workflows/Run%20Tests/badge.svg)](https://github.com/sdv-dev/DeepEcho/actions?query=workflow%3A%22Run+Tests%22+branch%3Amain) [![Downloads](https://pepy.tech/badge/deepecho)](https://pepy.tech/project/deepecho) [![Coverage Status](https://codecov.io/gh/sdv-dev/DeepEcho/branch/main/graph/badge.svg)](https://codecov.io/gh/sdv-dev/DeepEcho) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/sdv-dev/DeepEcho/main?filepath=tutorials/timeseries_data) [![Slack](https://img.shields.io/badge/Slack%20Workspace-Join%20now!-36C5F0?logo=slack)](https://bit.ly/sdv-slack-invite)

Overview

DeepEcho is a Synthetic Data Generation Python library for mixed-type, multivariate time series. It provides:

  1. Multiple models based both on classical statistical modeling of time series and the latest in Deep Learning techniques.
  2. A robust benchmarking framework for evaluating these methods on multiple datasets and with multiple metrics.
  3. Ability for Machine Learning researchers to submit new methods following our model and sample API and get evaluated.

| Important Links | | | --------------------------------------------- | -------------------------------------------------------------------- | | :computer: Website | Check out the SDV Website for more information about the project. | | :orange_book: SDV Blog | Regular publshing of useful content about Synthetic Data Generation. | | :book: Documentation | Quickstarts, User and Development Guides, and API Reference. | | :octocat: Repository | The link to the Github Repository of this library. | | :keyboard: Development Status | This software is in its Pre-Alpha stage. | | Community | Join our Slack Workspace for announcements and discussions. | | Tutorials | Run the SDV Tutorials in a Binder environment. |

Install

DeepEcho is part of the SDV project and is automatically installed alongside it. For details about this process please visit the SDV Installation Guide

Optionally, DeepEcho can also be installed as a standalone library using the following commands:

Using pip:

bash pip install deepecho

Using conda:

bash conda install -c pytorch -c conda-forge deepecho

For more installation options please visit the DeepEcho installation Guide

Quickstart

DeepEcho is included as part of SDV to model and sample synthetic time series. In most cases, usage through SDV is recommeded, since it provides additional functionalities which are not available here. For more details about how to use DeepEcho whithin SDV, please visit the corresponding User Guide:

Standalone usage

DeepEcho can also be used as a standalone library.

In this short quickstart, we show how to learn a mixed-type multivariate time series dataset and then generate synthetic data that resembles it.

We will start by loading the data and preparing the instance of our model.

```python3 from deepecho import PARModel from deepecho.demo import load_demo

Load demo data

data = load_demo()

Define data types for all the columns

datatypes = { 'region': 'categorical', 'dayofweek': 'categorical', 'totalsales': 'continuous', 'nb_customers': 'count', }

model = PARModel(cuda=False) ```

If we want to use different settings for our model, like increasing the number of epochs or enabling CUDA, we can pass the arguments when creating the model:

python # keep this as python (without the 3) to avoid using it in test-readme model = PARModel(epochs=1024, cuda=True)

Notice that for smaller datasets like the one used on this demo, CUDA usage introduces more overhead than the gains it obtains from parallelization, so the process in this case is more efficient without CUDA, even if it is available.

Once we have created our instance, we are ready to learn the data and generate new synthetic data that resembles it:

```python3

Learn a model from the data

model.fit( data=data, entitycolumns=['storeid'], contextcolumns=['region'], datatypes=datatypes, sequenceindex='date' )

Sample new data

model.sample(num_entities=5) ```

The output will be a table with synthetic time series data with the same properties to the demo data that we used as input.

What's next?

For more details about DeepEcho and all its possibilities and features, please check and run the tutorials.

If you want to see how we evaluate the performance and quality of our models, please have a look at the SDGym Benchmarking framework.

Also, please feel welcome to visit our contributing guide in order to help us developing new features or cool ideas!




The Synthetic Data Vault Project was first created at MIT's Data to AI Lab in 2016. After 4 years of research and traction with enterprise, we created DataCebo in 2020 with the goal of growing the project. Today, DataCebo is the proud developer of SDV, the largest ecosystem for synthetic data generation & evaluation. It is home to multiple libraries that support synthetic data, including:

  • 🔄 Data discovery & transformation. Reverse the transforms to reproduce realistic data.
  • 🧠 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular, multi table and time series data.
  • 📊 Measuring quality and privacy of synthetic data, and comparing different synthetic data generation models.

Get started using the SDV package -- a fully integrated solution and your one-stop shop for synthetic data. Or, use the standalone libraries for specific needs.

Owner

  • Name: The Synthetic Data Vault Project
  • Login: sdv-dev
  • Kind: organization
  • Email: sdv@sdv.dev

GitHub Events

Total
  • Create event: 27
  • Release event: 2
  • Issues event: 16
  • Watch event: 14
  • Delete event: 22
  • Issue comment event: 14
  • Push event: 41
  • Pull request review comment event: 5
  • Pull request review event: 29
  • Pull request event: 40
  • Fork event: 2
Last Year
  • Create event: 27
  • Release event: 2
  • Issues event: 16
  • Watch event: 14
  • Delete event: 22
  • Issue comment event: 14
  • Push event: 41
  • Pull request review comment event: 5
  • Pull request review event: 29
  • Pull request event: 40
  • Fork event: 2

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 281
  • Total Committers: 12
  • Avg Commits per committer: 23.417
  • Development Distribution Score (DDS): 0.488
Past Year
  • Commits: 36
  • Committers: 7
  • Avg Commits per committer: 5.143
  • Development Distribution Score (DDS): 0.556
Top Committers
Name Email Commits
Carles Sala c****s@p****m 144
Andrew Montanez a****w@s****v 34
Kevin Alex Zhang k****z@m****u 21
SDV Team 9****m 20
Felipe Alex Hofmann f****o@g****m 17
R-Palazzo 1****o 11
Katharine Xiao 2****o 10
Plamen Valentinov Kolev 4****r 10
lajohn4747 j****n@d****m 7
Gaurav Sheni g****i@g****m 3
Frances Hartwell f****9@g****m 3
Roy Wedge r****e@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 46
  • Total pull requests: 116
  • Average time to close issues: 4 months
  • Average time to close pull requests: 5 days
  • Total issue authors: 15
  • Total pull request authors: 13
  • Average comments per issue: 0.13
  • Average comments per pull request: 0.58
  • Merged pull requests: 105
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 8
  • Pull requests: 39
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 2 days
  • Issue authors: 4
  • Pull request authors: 8
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.64
  • Merged pull requests: 35
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • amontanez24 (14)
  • fealho (8)
  • R-Palazzo (4)
  • npatki (4)
  • gsheni (3)
  • pvk-developer (3)
  • Ale0x78 (2)
  • sarahmish (2)
  • csala (1)
  • alextopology (1)
  • Myprojectjoy (1)
  • HarisNaveed17 (1)
  • joanvaquer (1)
  • frances-h (1)
  • marketneutral (1)
Pull Request Authors
  • sdv-team (52)
  • R-Palazzo (16)
  • csala (15)
  • pvk-developer (15)
  • fealho (14)
  • amontanez24 (13)
  • gsheni (10)
  • frances-h (5)
  • k15z (5)
  • lajohn4747 (4)
  • katxiao (2)
  • rwedge (2)
  • CristianCuadrado (1)
Top Labels
Issue Labels
maintenance (21) bug (8) internal (7) feature request (4) question (2) documentation (1) resolution:resolved (1)
Pull Request Labels

Packages

  • Total packages: 5
  • Total downloads:
    • pypi 121,248 last-month
  • Total docker downloads: 34,514
  • Total dependent packages: 5
    (may contain duplicates)
  • Total dependent repositories: 13
    (may contain duplicates)
  • Total versions: 65
  • Total maintainers: 6
pypi.org: deepecho

Create sequential synthetic data of mixed types using a GAN.

  • Versions: 28
  • Dependent Packages: 3
  • Dependent Repositories: 12
  • Downloads: 121,248 Last month
  • Docker Downloads: 34,514
Rankings
Docker downloads count: 1.2%
Downloads: 1.4%
Dependent packages count: 1.6%
Dependent repos count: 4.2%
Average: 4.5%
Stargazers count: 8.1%
Forks count: 10.2%
Last synced: 6 months ago
proxy.golang.org: github.com/sdv-dev/deepecho
  • Versions: 17
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.4%
Average: 6.6%
Dependent repos count: 6.8%
Last synced: 6 months ago
proxy.golang.org: github.com/sdv-dev/DeepEcho
  • Versions: 17
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.4%
Average: 6.6%
Dependent repos count: 6.8%
Last synced: 6 months ago
spack.io: py-deepecho

DeepEcho is a Synthetic Data Generation Python library for mixed-type, multivariate time series.

  • Versions: 1
  • Dependent Packages: 1
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Average: 18.6%
Stargazers count: 21.2%
Forks count: 25.1%
Dependent packages count: 28.1%
Maintainers (1)
Last synced: 6 months ago
conda-forge.org: deepecho
  • Versions: 2
  • Dependent Packages: 1
  • Dependent Repositories: 1
Rankings
Dependent repos count: 24.1%
Dependent packages count: 29.0%
Average: 34.2%
Stargazers count: 39.0%
Forks count: 44.8%
Last synced: 6 months ago

Dependencies

setup.py pypi
  • numpy >=1.18.0,<1.20.0
  • numpy >=1.20.0,<2
  • pandas >=1.1.3,<2
  • torch >=1.8.0,<2
  • tqdm >=4.15,<5
.github/workflows/integration.yml actions
  • actions/checkout v1 composite
  • actions/setup-python v2 composite
  • codecov/codecov-action v2 composite
.github/workflows/lint.yml actions
  • actions/checkout v1 composite
  • actions/setup-python v2 composite
.github/workflows/minimum.yml actions
  • actions/checkout v1 composite
  • actions/setup-python v2 composite
.github/workflows/readme.yml actions
  • actions/checkout v1 composite
  • actions/setup-python v2 composite
.github/workflows/unit.yml actions
  • actions/checkout v1 composite
  • actions/setup-python v2 composite