https://github.com/mikekeith52/scalecast

The practitioner's forecasting library

https://github.com/mikekeith52/scalecast

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: springer.com
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.2%) to scientific vocabulary

Keywords

auto-ml data-science deep-learning easy-to-use forecasting keras lstm machine-learning mase msis pandas python recurrent-neural-networks scikit-learn scikit-learn-python smape time-series vecm
Last synced: 6 months ago · JSON representation

Repository

The practitioner's forecasting library

Basic Info
  • Host: GitHub
  • Owner: mikekeith52
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.16 GB
Statistics
  • Stars: 341
  • Watchers: 5
  • Forks: 40
  • Open Issues: 189
  • Releases: 5
Topics
auto-ml data-science deep-learning easy-to-use forecasting keras lstm machine-learning mase msis pandas python recurrent-neural-networks scikit-learn scikit-learn-python smape time-series vecm
Created over 4 years ago · Last pushed 7 months ago
Metadata Files
Readme Contributing License Code of conduct

README.md

Scalecast

Scalecast Logo

About

Scalecast helps you forecast time series. Here is how to initiate its main object: ```python from scalecast.Forecaster import Forecaster

f = Forecaster( y = arrayofvalues, currentdates = arrayofdates, futuredates=fcsthorizonlength, test_length = 0, # do you want to test all models? if so, on how many or what percent of observations? cis = False, # evaluate conformal confidence intervals for all models? metrics = ['rmse','mape','mae','r2'], # what metrics to evaluate over the validation/test sets? ) `` Uniform ML modeling (with models from a diverse set of libraries, including scikit-learn, statsmodels, and tensorflow), reporting, and data visualizations are offered through theForecasterandMVForecaster` interfaces. Data storage and processing then becomes easy as all applicable data, predictions, and many derived metrics are contained in a few objects with much customization available through different modules. Feature requests and issue reporting are welcome! Don't forget to leave a star!⭐

Documentation

Popular Features

  1. Easy LSTM Modeling: setting up an LSTM model for time series using tensorflow is hard. Using scalecast, it's easy. Many tutorials and Kaggle notebooks that are designed for those getting to know the model use scalecast (see the aritcle). python f.set_estimator('lstm') f.manual_forecast( lags=36, batch_size=32, epochs=15, validation_split=.2, activation='tanh', optimizer='Adam', learning_rate=0.001, lstm_layer_sizes=(100,)*3, dropout=(0,)*3, )
  2. Auto lag, trend, and seasonality selection: python f.auto_Xvar_select( # iterate through different combinations of covariates estimator = 'lasso', # what estimator? alpha = .2, # estimator hyperparams? monitor = 'ValidationMetricValue', # what metric to monitor to make decisions? cross_validate = True, # cross validate cvkwargs = {'k':3}, # 3 folds )
  3. Hyperparameter tuning using grid search and time series cross validation: ```python from scalecast import GridGenerator

GridGenerator.getexamplegrids() models = ['ridge','lasso','xgboost','lightgbm','knn'] f.tunetestforecast( models, limitgridsize = .2, featureimportance = True, # save pfi feature importance for each model? crossvalidate = True, # cross validate? if False, using a seperate validation set that the user can specify rolling = True, # rolling time series cross validation? k = 3, # how many folds? ) 4. **Plotting results:** plot test predictions, forecasts, fitted values, and more. python import matplotlib.pyplot as plt

fig, ax = plt.subplots(2,1, figsize = (12,6)) f.plottestset(models=models,orderby='TestSetRMSE',ax=ax[0]) f.plot(models=models,orderby='TestSetRMSE',ax=ax[1]) plt.show() 5. **Pipelines that include transformations, reverting, and backtesting:** python from scalecast import GridGenerator from scalecast.Pipeline import Transformer, Reverter, Pipeline from scalecast.util import findoptimaltransformation, backtest_metrics

def forecaster(f): models = ['ridge','lasso','xgboost','lightgbm','knn'] f.tunetestforecast( models, limitgridsize = .2, # randomized grid search on 20% of original grid sizes featureimportance = True, # save pfi feature importance for each model? crossvalidate = True, # cross validate? if False, using a seperate validation set that the user can specify rolling = True, # rolling time series cross validation? k = 3, # how many folds? )

transformer, reverter = findoptimaltransformation(f) # just one of several ways to select transformations for your series

pipeline = Pipeline( steps = [ ('Transform',transformer), ('Forecast',forecaster), ('Revert',reverter), ] )

f = pipeline.fitpredict(f) backtestresults = pipeline.backtest(f) metrics = backtestmetrics(backtestresults) 6. **Model stacking:** There are two ways to stack models with scalecast, with the [`StackingRegressor`](https://medium.com/towards-data-science/expand-your-time-series-arsenal-with-these-models-10c807d37558) from scikit-learn or using [its own stacking procedure](https://medium.com/p/7977c6667d29). python from scalecast.auxmodels import auto_arima

f.setestimator('lstm') f.manualforecast( lags=36, batchsize=32, epochs=15, validationsplit=.2, activation='tanh', optimizer='Adam', learningrate=0.001, lstmlayer_sizes=(100,)3, dropout=(0,)3, )

f.setestimator('prophet') f.manualforecast()

auto_arima(f)

stack previously evaluated models

f.addsignals(['lstm','prophet','arima']) f.setestimator('catboost') f.manualforecast() 7. **Multivariate modeling and multivariate pipelines:** python from scalecast.MVForecaster import MVForecaster from scalecast.Pipeline import MVPipeline from scalecast.util import findoptimaltransformation, backtestmetrics from scalecast import GridGenerator

GridGenerator.getmvgrids()

def mvforecaster(mvf): models = ['ridge','lasso','xgboost','lightgbm','knn'] mvf.tunetestforecast( models, limitgridsize = .2, # randomized grid search on 20% of original grid sizes cross_validate = True, # cross validate? if False, using a seperate validation set that the user can specify rolling = True, # rolling time series cross validation? k = 3, # how many folds? )

mvf = MVForecaster(f1,f2,f3) # can take N Forecaster objects

transformer1, reverter1 = findoptimaltransformation(f1) transformer2, reverter2 = findoptimaltransformation(f2) transformer3, reverter3 = findoptimaltransformation(f3)

pipeline = MVPipeline( steps = [ ('Transform',[transformer1,transformer2,transformer3]), ('Forecast',mvforecaster), ('Revert',[reverter1,reverter2,reverter3]) ] )

f1, f2, f3 = pipeline.fitpredict(f1, f2, f3) backtestresults = pipeline.backtest(f1, f2, f3) metrics = backtestmetrics(backtestresults) 8. **Transfer Learning (new with 0.19.0):** Train a model in one `Forecaster` object and use that model to make predictions on the data in a separate `Forecaster` object. python f = Forecaster(...) f.autoXvarselect() f.setestimator('xgboost') f.crossvalidate() f.auto_forecast()

fnew = Forecaster(...) # different series than f fnew = inferapplyXvarselection(inferfrom=f,applyto=fnew) fnew.transferpredict(transferfrom=f,model='xgboost') # transfers the xgboost model from f to fnew ```

Installation

  • Only the base package is needed to get started:
    • pip install --upgrade scalecast
  • Optional add-ons:
    • pip install tensorflow (for RNN/LSTM on Windows) or pip install tensorflow-macos (for MAC/M1)
    • pip install darts
    • pip install prophet
    • pip install greykite (for the silverkite model)
    • pip install kats (changepoint detection)
    • pip install pmdarima (auto arima)
    • pip install tqdm (progress bar for notebook)
    • pip install ipython (widgets for notebook)
    • pip install ipywidgets (widgets for notebook)
    • jupyter nbextension enable --py widgetsnbextension (widgets for notebook)
    • jupyter labextension install @jupyter-widgets/jupyterlab-manager (widgets for Lab)

Papers that use scalecast

Udemy Course

Scalecast: Machine Learning & Deep Learning

Blog posts and notebooks

Forecasting with Different Model Types

Transforming and Reverting

Confidence Intervals

Dynamic Validation

Model Input Selection

Scaled Forecasting on Many Series

Transfer Learning

Anomaly Detection

Contributing

How to cite scalecast

@misc{scalecast, title = {{scalecast}}, author = {Michael Keith}, year = {2024}, version = {<your version>}, url = {https://scalecast.readthedocs.io/en/latest/}, }

Owner

  • Name: Michael Keith
  • Login: mikekeith52
  • Kind: user
  • Location: Salt Lake City, UT

Data Scientist and Python Developer

GitHub Events

Total
  • Issues event: 1
  • Watch event: 11
  • Issue comment event: 1
  • Push event: 18
  • Pull request event: 15
  • Fork event: 1
  • Create event: 17
Last Year
  • Issues event: 1
  • Watch event: 11
  • Issue comment event: 1
  • Push event: 18
  • Pull request event: 15
  • Fork event: 1
  • Create event: 17

Committers

Last synced: 11 months ago

All Time
  • Total Commits: 523
  • Total Committers: 3
  • Avg Commits per committer: 174.333
  • Development Distribution Score (DDS): 0.216
Past Year
  • Commits: 8
  • Committers: 1
  • Avg Commits per committer: 8.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Michael Keith m****h@u****v 410
Michael Keith m****2@g****m 112
snyk-bot s****t@s****o 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 50
  • Total pull requests: 208
  • Average time to close issues: about 2 months
  • Average time to close pull requests: about 1 hour
  • Total issue authors: 33
  • Total pull request authors: 3
  • Average comments per issue: 2.96
  • Average comments per pull request: 0.03
  • Merged pull requests: 18
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 39
  • Average time to close issues: 13 days
  • Average time to close pull requests: N/A
  • Issue authors: 2
  • Pull request authors: 1
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • mikekeith52 (5)
  • jroy12345 (4)
  • amengjiao (3)
  • callmegar (3)
  • Mehul-Sanghvi (3)
  • raedbsili1991 (3)
  • ahmad-shahi (2)
  • fstayco (2)
  • fcekalovic (1)
  • pmudgal-Intel (1)
  • John-Miller12 (1)
  • justicedarko1000 (1)
  • Jansza (1)
  • bhishanpdl (1)
  • ricardobarroslourenco (1)
Pull Request Authors
  • mikekeith52 (322)
  • snyk-bot (3)
  • michellebaugraczyk (1)
Top Labels
Issue Labels
bug (16) question (10) enhancement (6) documentation (1) good first issue (1)
Pull Request Labels
enhancement (1)

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 832 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 4
    (may contain duplicates)
  • Total versions: 195
  • Total maintainers: 1
pypi.org: scalecast

The practitioner's time series forecasting library

  • Versions: 191
  • Dependent Packages: 0
  • Dependent Repositories: 3
  • Downloads: 818 Last month
Rankings
Stargazers count: 4.0%
Forks count: 6.4%
Downloads: 6.5%
Average: 7.2%
Dependent repos count: 8.9%
Dependent packages count: 10.1%
Maintainers (1)
Last synced: 6 months ago
pypi.org: scalecastdev
  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 14 Last month
Rankings
Stargazers count: 4.0%
Forks count: 6.4%
Dependent packages count: 10.1%
Average: 21.4%
Dependent repos count: 21.5%
Downloads: 64.8%
Maintainers (1)
Last synced: 6 months ago

Dependencies

docs/requirements.txt pypi
  • autodocsumm *
  • ipywidgets *
  • myst_parser *
  • nbsphinx *
  • numpydoc *
  • pandoc *
  • pdflatex *
  • pyyaml *
  • scalecast *
  • sphinx *
  • sphinx_rtd_theme *
  • sphinxcontrib-confluencebuilder *
  • sphinxcontrib-napoleon *
  • tqdm *
setup.py pypi
  • eli5 *
  • lightgbm *
  • matplotlib *
  • numpy *
  • openpyxl *
  • pandas *
  • pandas-datareader *
  • scikit-learn *
  • scipy *
  • seaborn *
  • statsmodels *
  • xgboost *
src/SCALECAST.egg-info/requires.txt pypi
  • eli5 *
  • lightgbm *
  • matplotlib *
  • numpy *
  • openpyxl *
  • pandas *
  • pandas-datareader *
  • scikit-learn *
  • scipy *
  • seaborn *
  • statsmodels *
  • xgboost *
test/requirements.txt pypi
  • darts * test
  • greykite * test
  • kats * test
  • pmdarima * test
  • prophet * test
  • scalecast * test
  • shap * test
  • tensorflow * test