torchtime

Benchmark time series data sets for PyTorch

https://github.com/philipdarke/torchtime

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 23 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.1%) to scientific vocabulary

Keywords

classification datasets physionet pytorch supervised-learning time-series

Last synced: 6 months ago · JSON representation ·

Repository

Benchmark time series data sets for PyTorch

Basic Info

Host: GitHub
Owner: philipdarke
License: mit
Language: Python
Default Branch: main
Homepage: https://philipdarke.com/torchtime
Size: 3.07 MB

Statistics

Stars: 36
Watchers: 1
Forks: 5
Open Issues: 2
Releases: 9

Topics

classification datasets physionet pytorch supervised-learning time-series

Created almost 4 years ago · Last pushed about 2 years ago

Metadata Files

Readme Changelog License Citation

Benchmark time series data sets for PyTorch

PyTorch data sets for supervised time series classification and prediction problems, including:

All UEA/UCR classification repository data sets
PhysioNet Challenge 2012 (in-hospital mortality)
PhysioNet Challenge 2019 (sepsis prediction)
A binary prediction variant of the 2019 PhysioNet Challenge

Why use `torchtime`?

Saves time. You don't have to write your own PyTorch data classes.
Better research. Use common, reproducible implementations of data sets for a level playing field when evaluating models.

Installation

Install PyTorch followed by torchtime:

bash $ pip install torchtime

bash $ conda install torchtime -c conda-forge

There is currently no Windows build for conda. Feedback is welcome from conda users in particular.

Getting started

Data classes have a common API. The split argument determines whether training ("train"), validation ("val") or test ("test") data are returned. The size of the splits are controlled with the train_prop and (optional) val_prop arguments.

PhysioNet data sets

Three PhysioNet data sets are currently supported:

torchtime.data.PhysioNet2012 returns the 2012 challenge (in-hospital mortality) [link].
torchtime.data.PhysioNet2019 returns the 2019 challenge (sepsis prediction) [link].
torchtime.data.PhysioNet2019Binary returns a binary prediction variant of the 2019 challenge.

For example, to load training data for the 2012 challenge with a 70/30% training/validation split and create a DataLoader for model training:

```python from torch.utils.data import DataLoader from torchtime.data import PhysioNet2012

physionet2012 = PhysioNet2012( split="train", trainprop=0.7, ) dataloader = DataLoader(physionet2012, batchsize=32) ```

UEA/UCR repository data sets

The torchtime.data.UEA class returns the UEA/UCR repository data set specified by the dataset argument, for example:

```python from torch.utils.data import DataLoader from torchtime.data import UEA

arrowhead = UEA( dataset="ArrowHead", split="train", trainprop=0.7, ) dataloader = DataLoader(arrowhead, batchsize=32) ```

Using the DataLoader

Batches are dictionaries of tensors X, y and length:

X are the time series data. The package follows the batch first convention therefore X has shape (n, s, c) where n is batch size, s is (longest) trajectory length and c is the number of channels. By default, the first channel is a time stamp.
y are one-hot encoded labels of shape (n, l) where l is the number of classes.
length are the length of each trajectory (before padding if sequences are of irregular length) i.e. a tensor of shape (n).

For example, ArrowHead is a univariate time series therefore X has two channels, the time stamp followed by the time series (c = 2). Each series has 251 observations (s = 251) and there are three classes (l = 3). For a batch size of 32:

python next_batch = next(iter(dataloader)) next_batch["X"].shape # torch.Size([32, 251, 2]) next_batch["y"].shape # torch.Size([32, 3]) next_batch["length"].shape # torch.Size([32])

See Using DataLoaders for more information.

Advanced options

Missing data can be imputed by setting impute to mean (replace with training data channel means) or forward (replace with previous observation). Alternatively a custom imputation function can be passed to the impute argument.
A time stamp (added by default), missing data mask and the time since previous observation can be appended with the boolean arguments time, mask and delta respectively.
Time series data are standardised using the standardise boolean argument.
The location of cached data can be changed with the path argument, for example to share a single cache location across projects.
For reproducibility, an optional random seed can be specified.
Missing data can be simulated using the missing argument to drop data at random from UEA/UCR data sets.

See the tutorials and API for more information.

Other resources

If you're looking for the TensorFlow equivalent for PhysioNet data sets try medicaltsdatasets.

Acknowledgements

torchtime uses some of the data processing ideas in Kidger et al, 2020 [1] and Che et al, 2018 [2].

This work is supported by the Engineering and Physical Sciences Research Council, Centre for Doctoral Training in Cloud Computing for Big Data, Newcastle University (grant number EP/L015358/1).

Citing `torchtime`

If you use this software, please cite the paper:

@software{darke_torchtime_2022, author = Darke, Philip and Missier, Paolo and Bacardit, Jaume, title = "Benchmark time series data sets for {PyTorch} - the torchtime package", month = July, year = 2022, publisher={arXiv}, doi = 10.48550/arXiv.2207.12503, url = https://doi.org/10.48550/arXiv.2207.12503, }

DOIs are also available for each version of the package here.

References

Kidger, P, Morrill, J, Foster, J, et al. Neural Controlled Differential Equations for Irregular Time Series. arXiv 2005.08926 (2020). [arXiv]
Che, Z, Purushotham, S, Cho, K, et al. Recurrent Neural Networks for Multivariate Time Series with Missing Values. Sci Rep 8, 6085 (2018). [doi]
Silva, I, Moody, G, Scott, DJ, et al. Predicting In-Hospital Mortality of ICU Patients: The PhysioNet/Computing in Cardiology Challenge 2012. Comput Cardiol 2012;39:245-248 (2010). [hdl]
Reyna, M, Josef, C, Jeter, R, et al. Early Prediction of Sepsis From Clinical Data: The PhysioNet/Computing in Cardiology Challenge. Critical Care Medicine 48 2: 210-217 (2019). [doi]
Reyna, M, Josef, C, Jeter, R, et al. Early Prediction of Sepsis from Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019 (version 1.0.0). PhysioNet (2019). [doi]
Goldberger, A, Amaral, L, Glass, L, et al. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 101 (23), pp. e215–e220 (2000). [doi]
Löning, M, Bagnall, A, Ganesh, S, et al. sktime: A Unified Interface for Machine Learning with Time Series. Workshop on Systems for ML at NeurIPS 2019 (2019). [doi]
Löning, M, Bagnall, A, Middlehurst, M, et al. alan-turing-institute/sktime: v0.10.1 (v0.10.1). Zenodo (2022). [doi]

License

Released under the MIT license.

Owner

Name: Philip Darke
Login: philipdarke
Kind: user
Company: Newcastle University

Website: philipdarke.com
Repositories: 4
Profile: https://github.com/philipdarke

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Darke"
  given-names: "Philip"
  orcid: "https://orcid.org/0000-0002-9033-2767"
title: "Benchmark time series data sets for PyTorch - the torchtime package"
doi: 10.48550/arXiv.2207.12503
date-released: 2022-03-31
url: "https://doi.org/10.48550/arXiv.2207.12503"

GitHub Events

Total

Watch event: 2

Last Year

Watch event: 2

Committers

Last synced: about 1 year ago

All Time

Total Commits: 33
Total Committers: 1
Avg Commits per committer: 33.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Philip Darke	4****e	33

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 3
Total pull requests: 2
Average time to close issues: about 2 months
Average time to close pull requests: 3 months
Total issue authors: 3
Total pull request authors: 2
Average comments per issue: 1.67
Average comments per pull request: 1.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

mauricekraus (1)
Zrealshadow (1)
philipdarke (1)

Pull Request Authors

mrkeaton1 (2)
mauricekraus (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

pyproject.toml pypi

Pygments ^2.11.2 develop
Sphinx ^4.4.0 develop
black ^22.1.0 develop
flake8 ^4.0.1 develop
genbadge ^1.0.6 develop
isort ^5.10.1 develop
myst-parser ^0.17.0 develop
pre-commit ^2.17.0 develop
pytest ^7.1.1 develop
pytest-cov ^3.0.0 develop
sphinx-autodoc-typehints ^1.17.0 develop
sphinx-copybutton ^0.5.0 develop
sphinx-rtd-theme ^1.0.0 develop
numpy ^1.21.0
python >=3.8,<3.10
requests ^2.27.1
scikit-learn ^1.1.1
sktime ^0.12.1
torch ^1.11.0
tqdm ^4.64.0

.github/workflows/build.yml actions

actions/checkout v2 composite
actions/download-artifact master composite
actions/setup-python v2 composite
actions/upload-artifact master composite
docker://antonyurchenko/git-release latest composite
peaceiris/actions-gh-pages v3.7.3 composite
pypa/gh-action-pypi-publish v1.5.0 composite

recipes/meta.yml cpan

torchtime

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Benchmark time series data sets for PyTorch

Why use torchtime?

Installation

Getting started

PhysioNet data sets

UEA/UCR repository data sets

Using the DataLoader

Advanced options

Other resources

Acknowledgements

Citing torchtime

References

License

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

Why use `torchtime`?

Citing `torchtime`