tsdb

a Python toolbox loads 172 public time series datasets for machine/deep learning with a single line of code. Datasets from multiple domains including healthcare, financial, power, traffic, weather, and etc.

https://github.com/wenjiedu/tsdb

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, scholar.google
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.0%) to scientific vocabulary

Keywords

classification data-mining database deep-learning forecasting imputation machine-learning partially-observed-time-series time-series time-series-analysis time-series-database time-series-datasets
Last synced: 4 months ago · JSON representation ·

Repository

a Python toolbox loads 172 public time series datasets for machine/deep learning with a single line of code. Datasets from multiple domains including healthcare, financial, power, traffic, weather, and etc.

Basic Info
Statistics
  • Stars: 211
  • Watchers: 7
  • Forks: 19
  • Open Issues: 2
  • Releases: 14
Topics
classification data-mining database deep-learning forecasting imputation machine-learning partially-observed-time-series time-series time-series-analysis time-series-database time-series-datasets
Created almost 4 years ago · Last pushed 5 months ago
Metadata Files
Readme License Code of conduct Citation

README.md

Welcome to TSDB

load 172 public time-series datasets with a single line of code ;-)

Python version the latest release version BSD-3 license Community GitHub contributors GitHub Repo stars GitHub Repo forks maintainability Coveralls report GitHub Testing arXiv DOI Conda downloads PyPI downloads

📣 TSDB now supports a total of 1️⃣7️⃣2️⃣ time-series datasets ‼️

TSDB is a part of PyPOTS (a Python toolbox for data mining on Partially-Observed Time Series), and was separated from PyPOTS for decoupling datasets from learning algorithms.

TSDB is created to help researchers and engineers get rid of data collecting and downloading, and focus back on data processing details. TSDB provides all-in-one-stop convenience for downloading and loading open-source time-series datasets (available datasets listed below).

❗️Please note that due to people have very different requirements for data processing, data-loading functions in TSDB only contain the most general steps (e.g. removing invalid samples) and won't process the data (not even normalize it). So, no worries, TSDB won't affect your data preprocessing. If you only want the raw datasets, TSDB can help you download and save raw datasets as well (take a look at Usage Examples below).

🤝 If you need TSDB to integrate an open-source dataset or want to add it into TSDB yourself, please feel free to request for it by creating an issue or make a PR to merge your code.

🤗 Please star this repo to help others notice TSDB if you think it is a useful toolkit. Please properly cite TSDB and PyPOTS in your publications if it helps with your research. This really means a lot to our open-source research. Thank you!

❖ Usage Examples

[!IMPORTANT] TSDB is available on both and ❗️

Install via pip:

pip install tsdb

or install from source code:

pip install https://github.com/WenjieDu/TSDB/archive/main.zip

or install via conda:

conda install tsdb -c conda-forge

```python import tsdb

list all available datasets in TSDB

tsdb.list()

['physionet_2012',

'physionet_2019',

'electricityloaddiagrams',

'beijingmultisiteair_quality',

'italyairquality',

'vessel_ais',

'electricitytransformertemperature',

'pems_traffic',

'solar_alabama',

'ucrueaACSF1',

'ucrueaAdiac',

...

select the dataset you need and load it, TSDB will download, extract, and process it automatically

data = tsdb.load('physionet_2012')

if you need the raw data, use downloadandextract()

tsdb.downloadandextract('physionet2012', './saveit_here')

datasets you once loaded are cached, and you can check them with listcacheddata()

tsdb.list_cache()

you can delete only one specific dataset's pickled cache

tsdb.deletecache(datasetname='physionet2012', onlypickle=True)

you can delete only one specific dataset raw files and preserve others

tsdb.deletecache(datasetname='physionet_2012')

or you can delete all cache with deletecacheddata() to free disk space

tsdb.delete_cache()

The default cache directory is ~/.pypots/tsdb under the user's home directory.

To avoid taking up too much space if downloading many datasets ,

TSDB cache directory can be migrated to an external disk

tsdb.migratecache("/mnt/externaldisk/TSDB_cache") ```

That's all. Simple and efficient. Enjoy it! 😃

❖ List of Available Datasets

| Name | Main Tasks | |---------------------------------------------------------------------------------------------------|-----------------------------------------| | PhysioNet Challenge 2012 | Forecasting, Imputation, Classification | | PhysioNet Challenge 2019 | Forecasting, Imputation, Classification | | Beijing Multi-Site Air-Quality | Forecasting, Imputation | | Italy Air Quality | Forecasting, Imputation | | Electricity Load Diagrams | Forecasting, Imputation | | Electricity Transformer Temperature (ETT) | Forecasting, Imputation | | Vessel AIS | Forecasting, Imputation, Classification | | PeMS Traffic | Forecasting, Imputation | | Solar Alabama | Forecasting, Imputation | | UCR & UEA Datasets (all 163 datasets) | Classification |

❖ Citing TSDB/PyPOTS

The paper introducing PyPOTS is available on arXiv, A short version of it is accepted by the 9th SIGKDD international workshop on Mining and Learning from Time Series (MiLeTS'23)). Additionally, PyPOTS has been included as a PyTorch Ecosystem project. We are pursuing to publish it in prestigious academic venues, e.g. JMLR (track for Machine Learning Open Source Software). If you use PyPOTS in your work, please cite it as below and 🌟star this repository to make others notice this library. 🤗

There are scientific research projects using PyPOTS and referencing in their papers. Here is an incomplete list of them.

bibtex @article{du2023pypots, title={{PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series}}, author={Wenjie Du}, journal={arXiv preprint arXiv:2305.18811}, year={2023}, } or

Wenjie Du. PyPOTS: a Python toolbox for data mining on Partially-Observed Time Series. arXiv, abs/2305.18811, 2023.

Owner

  • Name: Wenjie Du
  • Login: WenjieDu
  • Kind: user
  • Location: where time series is observed & valued
  • Company: @TimeSeries-AI

AI researcher on time series

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use PyPOTS, please cite it as below."
authors:
- family-names: "Du"
  given-names: "Wenjie"
  orcid: "https://orcid.org/0000-0003-3046-7835"

title: "PyPOTS: A Python Toolbox for Data Mining on Partially-Observed Time Series"

preferred-citation:
  type: article
  authors:
  - family-names: "Du"
    given-names: "Wenjie"
    orcid: "https://orcid.org/0000-0003-3046-7835"
  doi: "10.48550/arXiv.2305.18811"
  journal: "arXiv"
  title: "PyPOTS: A Python Toolbox for Data Mining on Partially-Observed Time Series"
  url: https://arxiv.org/abs/2305.18811

GitHub Events

Total
  • Create event: 13
  • Release event: 1
  • Issues event: 5
  • Watch event: 48
  • Delete event: 12
  • Issue comment event: 19
  • Push event: 21
  • Pull request review event: 1
  • Pull request event: 23
  • Fork event: 3
Last Year
  • Create event: 13
  • Release event: 1
  • Issues event: 5
  • Watch event: 48
  • Delete event: 12
  • Issue comment event: 19
  • Push event: 21
  • Pull request review event: 1
  • Pull request event: 23
  • Fork event: 3

Issues and Pull Requests

Last synced: 5 months ago

All Time
  • Total issues: 26
  • Total pull requests: 66
  • Average time to close issues: 5 days
  • Average time to close pull requests: about 21 hours
  • Total issue authors: 7
  • Total pull request authors: 5
  • Average comments per issue: 0.69
  • Average comments per pull request: 0.73
  • Merged pull requests: 66
  • Bot issues: 0
  • Bot pull requests: 5
Past Year
  • Issues: 5
  • Pull requests: 14
  • Average time to close issues: 6 days
  • Average time to close pull requests: 3 days
  • Issue authors: 1
  • Pull request authors: 3
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.57
  • Merged pull requests: 14
  • Bot issues: 0
  • Bot pull requests: 5
Top Authors
Issue Authors
  • WenjieDu (19)
  • RushiBhatt007 (1)
  • michaelalb (1)
  • metsumesquita (1)
  • fengtingle615 (1)
  • c0syfeng (1)
  • HANSZJH (1)
Pull Request Authors
  • WenjieDu (71)
  • dependabot[bot] (12)
  • IncubatorShokuhou (2)
  • eroell (2)
  • GrgicevicLukaNTNU (1)
Top Labels
Issue Labels
bug (8) enhancement (8) question (6) new feature (3) new dataset (2) dependencies (1)
Pull Request Labels
dependencies (12) github_actions (12) stale (2)

Packages

  • Total packages: 3
  • Total downloads:
    • pypi 88,771 last-month
  • Total dependent packages: 1
    (may contain duplicates)
  • Total dependent repositories: 4
    (may contain duplicates)
  • Total versions: 37
  • Total maintainers: 1
proxy.golang.org: github.com/WenjieDu/TSDB
  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.4%
Average: 6.6%
Dependent repos count: 6.8%
Last synced: 5 months ago
proxy.golang.org: github.com/wenjiedu/tsdb
  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.4%
Average: 6.6%
Dependent repos count: 6.8%
Last synced: 5 months ago
pypi.org: tsdb

TSDB (Time Series Data Beans): a Python toolbox helping load 172 open-source time-series datasets

  • Homepage: https://pypots.com
  • Documentation: https://docs.pypots.com
  • License: Copyright (c) 2023-present, Wenjie Du All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  • Latest release: 0.7.1
    published 10 months ago
  • Versions: 21
  • Dependent Packages: 1
  • Dependent Repositories: 4
  • Downloads: 88,771 Last month
Rankings
Downloads: 2.8%
Dependent packages count: 4.7%
Average: 6.7%
Dependent repos count: 7.5%
Stargazers count: 7.7%
Forks count: 10.9%
Maintainers (1)
Last synced: 5 months ago

Dependencies

requirements.txt pypi
  • numpy *
  • pandas *
  • scikit_learn *
  • scipy *
setup.py pypi
  • numpy *
  • pandas *
  • scikit_learn *
  • scipy *
.github/workflows/greetings.yml actions
  • actions/first-interaction v1 composite
.github/workflows/linting.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
.github/workflows/publish_to_PyPI.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • pypa/gh-action-pypi-publish v1.8.7 composite
.github/workflows/stale.yml actions
  • actions/stale v8 composite
.github/workflows/testing_ci.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • coverallsapp/github-action master composite
.github/workflows/testing_daily.yml actions
  • actions/checkout v3 composite
  • conda-incubator/setup-miniconda v2 composite
  • coverallsapp/github-action master composite
docs/requirements.txt pypi
  • docutils *
  • furo *
  • numpy *
  • pandas *
  • scikit_learn *
  • scipy *
  • sphinx *
  • sphinx-autodoc-typehints *
  • sphinxcontrib-bibtex *