autots

Automated Time Series Forecasting

https://github.com/winedarksea/autots

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (19.6%) to scientific vocabulary

Keywords

automl autots deep-learning feature-engineering forecasting machine-learning preprocessing time-series

Last synced: 10 months ago · JSON representation

Repository

Automated Time Series Forecasting

Basic Info

Host: GitHub
Owner: winedarksea
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 47 MB

Statistics

Stars: 1,314
Watchers: 25
Forks: 114
Open Issues: 23
Releases: 64

Topics

automl autots deep-learning feature-engineering forecasting machine-learning preprocessing time-series

Created over 6 years ago · Last pushed about 1 year ago

Metadata Files

Readme Contributing License Code of conduct

AutoTS

AutoTS Logo

AutoTS is a time series package for Python designed for rapidly deploying high-accuracy forecasts at scale.

In 2023, AutoTS won in the M6 forecasting competition, delivering the highest performance investment decisions across 12 months of stock market forecasting.

There are dozens of forecasting models usable in the sklearn style of .fit() and .predict(). These includes naive, statistical, machine learning, and deep learning models. Additionally, there are over 30 time series specific transforms usable in the sklearn style of .fit(), .transform() and .inverse_transform(). All of these function directly on Pandas Dataframes, without the need for conversion to proprietary objects.

All models support forecasting multivariate (multiple time series) outputs and also support probabilistic (upper/lower bound) forecasts. Most models can readily scale to tens and even hundreds of thousands of input series. Many models also support passing in user-defined exogenous regressors.

These models are all designed for integration in an AutoML feature search which automatically finds the best models, preprocessing, and ensembling for a given dataset through genetic algorithms.

Horizontal and mosaic style ensembles are the flagship ensembling types, allowing each series to receive the most accurate possible models while still maintaining scalability.

A combination of metrics and cross-validation options, the ability to apply subsets and weighting, regressor generation tools, simulation forecasting mode, event risk forecasting, live datasets, template import and export, plotting, and a collection of data shaping parameters round out the available feature set.

Installation

pip install autots This includes dependencies for basic models, but additonal packages are required for some models and methods.

Be advised there are several other projects that have chosen similar names, so make sure you are on the right AutoTS code, papers, and documentation.

Basic Use

Input data for AutoTS is expected to come in either a long or a wide format:

The wide format is a pandas.DataFrame with a pandas.DatetimeIndex and each column a distinct series.
The long format has three columns:
- Date (ideally already in pandas-recognized datetime format)
- Series ID. For a single time series, series_id can be = None.
- Value
For long data, the column name for each of these is passed to .fit() as date_col, id_col, and value_col. No parameters are needed for wide data.

Lower-level functions are only designed for wide style data.

```python

also load: hourly, _monthly, _weekly, _yearly, or _livedaily

from autots import AutoTS, load_daily

sample datasets can be used in either of the long or wide import shapes

long = False df = load_daily(long=long)

model = AutoTS( forecastlength=21, frequency="infer", predictioninterval=0.9, ensemble=None, modellist="superfast", # "fast", "default", "fastparallel" transformerlist="fast", # "superfast", dropmostrecent=1, maxgenerations=4, numvalidations=2, validationmethod="backwards" ) model = model.fit( df, datecol='datetime' if long else None, valuecol='value' if long else None, idcol='seriesid' if long else None, )

prediction = model.predict()

plot a sample

prediction.plot(model.dfwidenumeric, series=model.dfwidenumeric.columns[0], start_date="2019-01-01")

Print the details of the best model

print(model)

point forecasts dataframe

forecasts_df = prediction.forecast

upper and lower forecasts

forecastsup, forecastslow = prediction.upperforecast, prediction.lowerforecast

accuracy of all tried model results

model_results = model.results()

and aggregated from cross validation

validation_results = model.results("validation") ```

The lower-level API, in particular the large section of time series transformers in the scikit-learn style, can also be utilized independently from the AutoML framework.

Check out extended_tutorial.md for a more detailed guide to features.

Also take a look at the production_example.py

Tips for Speed and Large Data:

Use appropriate model lists, especially the predefined lists:
- superfast (simple naive models) and fast (more complex but still faster models, optimized for many series)
- fast_parallel (a combination of fast and parallel) or parallel, given many CPU cores are available
  - n_jobs usually gets pretty close with ='auto' but adjust as necessary for the environment
- 'scalable' is the best list to avoid crashing when many series are present. There is also a transformer_list = 'scalable'
- see a dict of predefined lists (some defined for internal use) with from autots.models.model_list import model_lists
Use the subset parameter when there are many similar series, subset=100 will often generalize well for tens of thousands of similar series.
- if using subset, passing weights for series will weight subset selection towards higher priority series.
- if limited by RAM, it can be distributed by running multiple instances of AutoTS on different batches of data, having first imported a template pretrained as a starting point for all.
Set model_interrupt=True which passes over the current model when a KeyboardInterrupt ie crtl+c is pressed (although if the interrupt falls between generations it will stop the entire training).
Use the result_file method of .fit() which will save progress after each generation - helpful to save progress if a long training is being done. Use import_results to recover.
While Transformations are pretty fast, setting transformer_max_depth to a lower number (say, 2) will increase speed. Also utilize transformer_list == 'fast' or 'superfast'.
Check out this example of using AutoTS with pandas UDF.
Ensembles are obviously slower to predict because they run many models, 'distance' models 2x slower, and 'simple' models 3x-5x slower.
- ensemble='horizontal-max' with model_list='no_shared_fast' can scale relatively well given many cpu cores because each model is only run on the series it is needed for.
Reducing num_validations and models_to_validate will decrease runtime but may lead to poorer model selections.
For datasets with many records, upsampling (for example, from daily to monthly frequency forecasts) can reduce training time if appropriate.
- this can be done by adjusting frequency and aggfunc but is probably best done before passing data into AutoTS.
It will be faster if NaN's are already filled. If a search for optimal NaN fill method is not required, then fill any NaN with a satisfactory method before passing to class.
Set runtime_weighting in metric_weighting to a higher value. This will guide the search towards faster models, although it may come at the expense of accuracy.
Memory shortage is the most common cause of random process/kernel crashes. Try testing a data subset and using a different model list if issues occur. Please also report crashes if found to be linked to a specific set of model parameters (not AutoTS parameters but the underlying forecasting model params). Also crashes vary significantly by setup such as underlying linpack/blas so seeing crash differences between environments can be expected.

How to Contribute:

Give feedback on where you find the documentation confusing
Use AutoTS and...
- Report errors and request features by adding Issues on GitHub
- Posting the top model templates for your data (to help improve the starting templates)
- Feel free to recommend different search grid parameters for your favorite models
And, of course, contributing to the codebase directly on GitHub.

AutoTS Process

```mermaid flowchart TD A[Initiate AutoTS Model] --> B[Import Template] B --> C[Load Data] C --> D[Split Data Into Initial Train/Test Holdout] D --> E[Run Initial Template Models] E --> F[Evaluate Accuracy Metrics on Results] F --> G[Generate Score from Accuracy Metrics] G --> H{Max Generations Reached or Timeout?}

H -->|No| I[Evaluate All Previous Templates]
I --> J[Genetic Algorithm Combines Best Results and New Random Parameters into New Template]
J --> K[Run New Template Models and Evaluate]
K --> G

H -->|Yes| L[Select Best Models by Score for Validation Template]
L --> M[Run Validation Template on Additional Holdouts]
M --> N[Evaluate and Score Validation Results]
N --> O{Create Ensembles?}

O -->|Yes| P[Generate Ensembles from Validation Results]
P --> Q[Run Ensembles Through Validation]
Q --> N

O -->|No| R[Export Best Models Template]
R --> S[Select Single Best Model]
S --> T[Generate Future Time Forecast]
T --> U[Visualize Results]

R --> B[Import Best Models Template]

```

Also known as Project CATS (Catlin's Automated Time Series) hence the logo.

Owner

Name: Colin Catlin
Login: winedarksea
Kind: user
Location: Minnesota

Website: https://syllepsis.live/
Repositories: 2
Profile: https://github.com/winedarksea

Data Scientist ----- 'Come let us drag one of our dark ships to the bright salt sea'

GitHub Events

Total

Release event: 5
Watch event: 196
Delete event: 3
Issue comment event: 1
Push event: 151
Pull request event: 12
Fork event: 13
Create event: 8

Last Year

Release event: 5
Watch event: 196
Delete event: 3
Issue comment event: 1
Push event: 151
Pull request event: 12
Fork event: 13
Create event: 8

Committers

Last synced: about 1 year ago

All Time

Total Commits: 986
Total Committers: 1
Avg Commits per committer: 986.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 274
Committers: 1
Avg Commits per committer: 274.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Colin Catlin	c**n@g**m	986

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 86
Total pull requests: 72
Average time to close issues: 2 months
Average time to close pull requests: 5 days
Total issue authors: 49
Total pull request authors: 8
Average comments per issue: 2.94
Average comments per pull request: 0.28
Merged pull requests: 64
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 10
Average time to close issues: about 3 hours
Average time to close pull requests: about 1 hour
Issue authors: 2
Pull request authors: 1
Average comments per issue: 0.5
Average comments per pull request: 0.1
Merged pull requests: 10
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

winedarksea (22)
emobs (6)
r-matsuzaka (5)
catchlui (3)
El-Don-Quijote (2)
yvadalia (2)
carterrees-entrata (2)
vinitkothari24 (2)
taogeanton2 (1)
Louis24 (1)
faridelya (1)
ghost (1)
gabrielefantini (1)
sebros-sandvik (1)
govarsha (1)

Pull Request Authors

winedarksea (74)
lewuyou (2)
eftalgezer (1)
jxtrbtk (1)
B8ni (1)
pauljones0 (1)
adai183 (1)
TheOafidian (1)

Top Labels

Issue Labels

Pull Request Labels

codex (2)

Packages

Total packages: 2
Total downloads:
- pypi 37,001 last-month

Total dependent packages: 1
(may contain duplicates)
Total dependent repositories: 12
(may contain duplicates)
Total versions: 75
Total maintainers: 1

pypi.org: autots

Automated Time Series Forecasting

Homepage: https://github.com/winedarksea/AutoTS
Documentation: https://autots.readthedocs.io/
License: MIT License
Latest release: 0.6.21
published over 1 year ago

Versions: 69
Dependent Packages: 1
Dependent Repositories: 12
Downloads: 37,001 Last month

Rankings

Downloads: 1.3%

Stargazers count: 2.1%

Average: 4.0%

Dependent repos count: 4.2%

Forks count: 5.0%

Dependent packages count: 7.3%

Maintainers (1)

colin.catlin

Last synced: 11 months ago

conda-forge.org: autots

AutoTS is a time series package for Python designed for rapidly deploying high-accuracy forecasts at scale.

Homepage: https://github.com/winedarksea/AutoTS
License: MIT
Latest release: 0.5.1
published over 3 years ago

Versions: 6
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Stargazers count: 14.6%

Forks count: 21.7%

Average: 30.4%

Dependent repos count: 34.0%

Dependent packages count: 51.2%

Last synced: 11 months ago

Dependencies

.github/workflows/codeql-analysis.yml actions

actions/checkout v2 composite
github/codeql-action/analyze v2 composite
github/codeql-action/autobuild v2 composite
github/codeql-action/init v2 composite

autots

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

AutoTS

Table of Contents

Installation

Basic Use

also load: hourly, _monthly, _weekly, _yearly, or _livedaily

sample datasets can be used in either of the long or wide import shapes

plot a sample

Print the details of the best model

point forecasts dataframe

upper and lower forecasts

accuracy of all tried model results

and aggregated from cross validation

Tips for Speed and Large Data:

How to Contribute:

AutoTS Process

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: autots

Rankings

Maintainers (1)

conda-forge.org: autots

Rankings

Dependencies