https://github.com/mikekeith52/scalecast
The practitioner's forecasting library
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: springer.com -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.2%) to scientific vocabulary
Keywords
Repository
The practitioner's forecasting library
Basic Info
Statistics
- Stars: 341
- Watchers: 5
- Forks: 40
- Open Issues: 189
- Releases: 5
Topics
Metadata Files
README.md
Scalecast
About
Scalecast helps you forecast time series. Here is how to initiate its main object: ```python from scalecast.Forecaster import Forecaster
f = Forecaster(
y = arrayofvalues,
currentdates = arrayofdates,
futuredates=fcsthorizonlength,
test_length = 0, # do you want to test all models? if so, on how many or what percent of observations?
cis = False, # evaluate conformal confidence intervals for all models?
metrics = ['rmse','mape','mae','r2'], # what metrics to evaluate over the validation/test sets?
)
``
Uniform ML modeling (with models from a diverse set of libraries, including scikit-learn, statsmodels, and tensorflow), reporting, and data visualizations are offered through theForecasterandMVForecaster` interfaces. Data storage and processing then becomes easy as all applicable data, predictions, and many derived metrics are contained in a few objects with much customization available through different modules. Feature requests and issue reporting are welcome! Don't forget to leave a star!⭐
Documentation
Popular Features
- Easy LSTM Modeling: setting up an LSTM model for time series using tensorflow is hard. Using scalecast, it's easy. Many tutorials and Kaggle notebooks that are designed for those getting to know the model use scalecast (see the aritcle).
python f.set_estimator('lstm') f.manual_forecast( lags=36, batch_size=32, epochs=15, validation_split=.2, activation='tanh', optimizer='Adam', learning_rate=0.001, lstm_layer_sizes=(100,)*3, dropout=(0,)*3, ) - Auto lag, trend, and seasonality selection:
python f.auto_Xvar_select( # iterate through different combinations of covariates estimator = 'lasso', # what estimator? alpha = .2, # estimator hyperparams? monitor = 'ValidationMetricValue', # what metric to monitor to make decisions? cross_validate = True, # cross validate cvkwargs = {'k':3}, # 3 folds ) - Hyperparameter tuning using grid search and time series cross validation: ```python from scalecast import GridGenerator
GridGenerator.getexamplegrids()
models = ['ridge','lasso','xgboost','lightgbm','knn']
f.tunetestforecast(
models,
limitgridsize = .2,
featureimportance = True, # save pfi feature importance for each model?
crossvalidate = True, # cross validate? if False, using a seperate validation set that the user can specify
rolling = True, # rolling time series cross validation?
k = 3, # how many folds?
)
4. **Plotting results:** plot test predictions, forecasts, fitted values, and more.
python
import matplotlib.pyplot as plt
fig, ax = plt.subplots(2,1, figsize = (12,6))
f.plottestset(models=models,orderby='TestSetRMSE',ax=ax[0])
f.plot(models=models,orderby='TestSetRMSE',ax=ax[1])
plt.show()
5. **Pipelines that include transformations, reverting, and backtesting:**
python
from scalecast import GridGenerator
from scalecast.Pipeline import Transformer, Reverter, Pipeline
from scalecast.util import findoptimaltransformation, backtest_metrics
def forecaster(f): models = ['ridge','lasso','xgboost','lightgbm','knn'] f.tunetestforecast( models, limitgridsize = .2, # randomized grid search on 20% of original grid sizes featureimportance = True, # save pfi feature importance for each model? crossvalidate = True, # cross validate? if False, using a seperate validation set that the user can specify rolling = True, # rolling time series cross validation? k = 3, # how many folds? )
transformer, reverter = findoptimaltransformation(f) # just one of several ways to select transformations for your series
pipeline = Pipeline( steps = [ ('Transform',transformer), ('Forecast',forecaster), ('Revert',reverter), ] )
f = pipeline.fitpredict(f)
backtestresults = pipeline.backtest(f)
metrics = backtestmetrics(backtestresults)
6. **Model stacking:** There are two ways to stack models with scalecast, with the [`StackingRegressor`](https://medium.com/towards-data-science/expand-your-time-series-arsenal-with-these-models-10c807d37558) from scikit-learn or using [its own stacking procedure](https://medium.com/p/7977c6667d29).
python
from scalecast.auxmodels import auto_arima
f.setestimator('lstm') f.manualforecast( lags=36, batchsize=32, epochs=15, validationsplit=.2, activation='tanh', optimizer='Adam', learningrate=0.001, lstmlayer_sizes=(100,)3, dropout=(0,)3, )
f.setestimator('prophet') f.manualforecast()
auto_arima(f)
stack previously evaluated models
f.addsignals(['lstm','prophet','arima'])
f.setestimator('catboost')
f.manualforecast()
7. **Multivariate modeling and multivariate pipelines:**
python
from scalecast.MVForecaster import MVForecaster
from scalecast.Pipeline import MVPipeline
from scalecast.util import findoptimaltransformation, backtestmetrics
from scalecast import GridGenerator
GridGenerator.getmvgrids()
def mvforecaster(mvf): models = ['ridge','lasso','xgboost','lightgbm','knn'] mvf.tunetestforecast( models, limitgridsize = .2, # randomized grid search on 20% of original grid sizes cross_validate = True, # cross validate? if False, using a seperate validation set that the user can specify rolling = True, # rolling time series cross validation? k = 3, # how many folds? )
mvf = MVForecaster(f1,f2,f3) # can take N Forecaster objects
transformer1, reverter1 = findoptimaltransformation(f1) transformer2, reverter2 = findoptimaltransformation(f2) transformer3, reverter3 = findoptimaltransformation(f3)
pipeline = MVPipeline( steps = [ ('Transform',[transformer1,transformer2,transformer3]), ('Forecast',mvforecaster), ('Revert',[reverter1,reverter2,reverter3]) ] )
f1, f2, f3 = pipeline.fitpredict(f1, f2, f3)
backtestresults = pipeline.backtest(f1, f2, f3)
metrics = backtestmetrics(backtestresults)
8. **Transfer Learning (new with 0.19.0):** Train a model in one `Forecaster` object and use that model to make predictions on the data in a separate `Forecaster` object.
python
f = Forecaster(...)
f.autoXvarselect()
f.setestimator('xgboost')
f.crossvalidate()
f.auto_forecast()
fnew = Forecaster(...) # different series than f fnew = inferapplyXvarselection(inferfrom=f,applyto=fnew) fnew.transferpredict(transferfrom=f,model='xgboost') # transfers the xgboost model from f to fnew ```
Installation
- Only the base package is needed to get started:
pip install --upgrade scalecast
- Optional add-ons:
pip install tensorflow(for RNN/LSTM on Windows) orpip install tensorflow-macos(for MAC/M1)pip install dartspip install prophetpip install greykite(for the silverkite model)pip install kats(changepoint detection)pip install pmdarima(auto arima)pip install tqdm(progress bar for notebook)pip install ipython(widgets for notebook)pip install ipywidgets(widgets for notebook)jupyter nbextension enable --py widgetsnbextension(widgets for notebook)jupyter labextension install @jupyter-widgets/jupyterlab-manager(widgets for Lab)
Papers that use scalecast
- Post-covid customer service behavior forecasting using machine learning techniques
- Application of ANN and traditional ML algorithms in modelling compost production under different climatic conditions
- Reservoir Computing Solutions for Streamflow Modeling and Prediction in Real World Scenarios
- LSTM-based recurrent neural network provides effective short term flu forecasting
- IMPLEMENTING AN ENERGY TRADING STRATEGY USING FORECASTING OF ENERGY PRICES AND PRODUCTION
- Modelamiento predictivo del número de visitantes en un centro comercial
Udemy Course
Scalecast: Machine Learning & Deep Learning
Blog posts and notebooks
Forecasting with Different Model Types
- Sklearn Univariate
- Sklearn Multivariate
- RNN
- ARIMA
- Theta
- VECM
- Stacking
- Other Notebooks
Transforming and Reverting
Confidence Intervals
- Easy Distribution-Free Conformal Intervals for Time Series
- Dynamic Conformal Intervals for any Time Series Model
- Notebook 1
- Notebook 2
Dynamic Validation
Model Input Selection
- Variable Reduction Techniques for Time Series
- Auto Model Specification with ML Techniques for Time Series
- Notebook 1
- Notebook 2
Scaled Forecasting on Many Series
Transfer Learning
Anomaly Detection
Contributing
- Contributing.md
- Want something that's not listed? Open an issue!
How to cite scalecast
@misc{scalecast,
title = {{scalecast}},
author = {Michael Keith},
year = {2024},
version = {<your version>},
url = {https://scalecast.readthedocs.io/en/latest/},
}
Owner
- Name: Michael Keith
- Login: mikekeith52
- Kind: user
- Location: Salt Lake City, UT
- Repositories: 2
- Profile: https://github.com/mikekeith52
Data Scientist and Python Developer
GitHub Events
Total
- Issues event: 1
- Watch event: 11
- Issue comment event: 1
- Push event: 18
- Pull request event: 15
- Fork event: 1
- Create event: 17
Last Year
- Issues event: 1
- Watch event: 11
- Issue comment event: 1
- Push event: 18
- Pull request event: 15
- Fork event: 1
- Create event: 17
Committers
Last synced: 11 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Michael Keith | m****h@u****v | 410 |
| Michael Keith | m****2@g****m | 112 |
| snyk-bot | s****t@s****o | 1 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 50
- Total pull requests: 208
- Average time to close issues: about 2 months
- Average time to close pull requests: about 1 hour
- Total issue authors: 33
- Total pull request authors: 3
- Average comments per issue: 2.96
- Average comments per pull request: 0.03
- Merged pull requests: 18
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 39
- Average time to close issues: 13 days
- Average time to close pull requests: N/A
- Issue authors: 2
- Pull request authors: 1
- Average comments per issue: 1.0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- mikekeith52 (5)
- jroy12345 (4)
- amengjiao (3)
- callmegar (3)
- Mehul-Sanghvi (3)
- raedbsili1991 (3)
- ahmad-shahi (2)
- fstayco (2)
- fcekalovic (1)
- pmudgal-Intel (1)
- John-Miller12 (1)
- justicedarko1000 (1)
- Jansza (1)
- bhishanpdl (1)
- ricardobarroslourenco (1)
Pull Request Authors
- mikekeith52 (322)
- snyk-bot (3)
- michellebaugraczyk (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- pypi 832 last-month
-
Total dependent packages: 0
(may contain duplicates) -
Total dependent repositories: 4
(may contain duplicates) - Total versions: 195
- Total maintainers: 1
pypi.org: scalecast
The practitioner's time series forecasting library
- Homepage: https://github.com/mikekeith52/scalecast
- Documentation: https://scalecast.readthedocs.io/
- License: MIT
-
Latest release: 0.19.10
published over 1 year ago
Rankings
Maintainers (1)
pypi.org: scalecastdev
- Homepage: https://github.com/mikekeith52/scalecast
- Documentation: https://scalecastdev.readthedocs.io/
- License: MIT
-
Latest release: 0.2.0
published over 4 years ago
Rankings
Maintainers (1)
Dependencies
- autodocsumm *
- ipywidgets *
- myst_parser *
- nbsphinx *
- numpydoc *
- pandoc *
- pdflatex *
- pyyaml *
- scalecast *
- sphinx *
- sphinx_rtd_theme *
- sphinxcontrib-confluencebuilder *
- sphinxcontrib-napoleon *
- tqdm *
- eli5 *
- lightgbm *
- matplotlib *
- numpy *
- openpyxl *
- pandas *
- pandas-datareader *
- scikit-learn *
- scipy *
- seaborn *
- statsmodels *
- xgboost *
- eli5 *
- lightgbm *
- matplotlib *
- numpy *
- openpyxl *
- pandas *
- pandas-datareader *
- scikit-learn *
- scipy *
- seaborn *
- statsmodels *
- xgboost *
- darts * test
- greykite * test
- kats * test
- pmdarima * test
- prophet * test
- scalecast * test
- shap * test
- tensorflow * test