https://github.com/nixtla/mlforecast
Scalable machine π€ learning for time series forecasting.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
βCITATION.cff file
-
βcodemeta.json file
Found codemeta.json file -
β.zenodo.json file
Found .zenodo.json file -
βDOI references
-
βAcademic publication links
-
βCommitters with academic emails
-
βInstitutional organization owner
-
βJOSS paper metadata
-
βScientific vocabulary similarity
Low similarity (14.2%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Scalable machine π€ learning for time series forecasting.
Basic Info
- Host: GitHub
- Owner: Nixtla
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://nixtlaverse.nixtla.io/mlforecast
- Size: 29.7 MB
Statistics
- Stars: 1,057
- Watchers: 10
- Forks: 102
- Open Issues: 33
- Releases: 41
Topics
Metadata Files
README.md
mlforecast

Machine Learning π€ Forecast
Scalable machine learning for time series forecasting
[](https://github.com/Nixtla/mlforecast/actions/workflows/ci.yaml) [](https://pypi.org/project/mlforecast/) [](https://pypi.org/project/mlforecast/) [](https://anaconda.org/conda-forge/mlforecast) [](https://github.com/Nixtla/mlforecast/blob/main/LICENSE) **mlforecast** is a framework to perform time series forecasting using machine learning models, with the option to scale to massive amounts of data using remote clusters.Install
PyPI
pip install mlforecast
conda-forge
conda install -c conda-forge mlforecast
For more detailed instructions you can refer to the installation page.
Quick Start
Get Started with this quick guide.
Follow this end-to-end walkthrough for best practices.
Videos
Sample notebooks
Why?
Current Python alternatives for machine learning models are slow,
inaccurate and donβt scale well. So we created a library that can be
used to forecast in production environments.
MLForecast
includes efficient feature engineering to train any machine learning
model (with fit and predict methods such as
sklearn) to fit millions of time
series.
Features
- Fastest implementations of feature engineering for time series forecasting in Python.
- Out-of-the-box compatibility with pandas, polars, spark, dask, and ray.
- Probabilistic Forecasting with Conformal Prediction.
- Support for exogenous variables and static covariates.
- Familiar
sklearnsyntax:.fitand.predict.
Missing something? Please open an issue or write us in

Examples and Guides
π End to End Walkthrough: model training, evaluation and selection for multiple time series.
π Probabilistic Forecasting: use Conformal Prediction to produce prediciton intervals.
π©βπ¬ Cross Validation: robust modelβs performance evaluation.
π Predict Demand Peaks: electricity load forecasting for detecting daily peaks and reducing electric bills.
π Transfer Learning: pretrain a model using a set of time series and then predict another one using that pretrained model.
π‘οΈ Distributed Training: use a Dask, Ray or Spark cluster to train models at scale.
How to use
The following provides a very basic overview, for a more detailed description see the documentation.
Data setup
Store your time series in a pandas dataframe in long format, that is, each row represents an observation for a specific serie and timestamp.
``` python from mlforecast.utils import generatedailyseries
series = generatedailyseries( nseries=20, maxlength=100, nstaticfeatures=1, staticascategorical=False, with_trend=True ) series.head() ```
| | uniqueid | ds | y | static0 | |-----|-----------|------------|------------|----------| | 0 | id00 | 2000-01-01 | 17.519167 | 72 | | 1 | id00 | 2000-01-02 | 87.799695 | 72 | | 2 | id00 | 2000-01-03 | 177.442975 | 72 | | 3 | id00 | 2000-01-04 | 232.704110 | 72 | | 4 | id_00 | 2000-01-05 | 317.510474 | 72 |
Note: The unique_id serves as an identifier for each distinct time series in your dataset. If you are using only single time series from your dataset, set this column to a constant value.
Models
Next define your models, each one will be trained on all series. These can be any regressor that follows the scikit-learn API.
python
import lightgbm as lgb
from sklearn.linear_model import LinearRegression
python
models = [
lgb.LGBMRegressor(random_state=0, verbosity=-1),
LinearRegression(),
]
Forecast object
Now instantiate an
MLForecast
object with the models and the features that you want to use. The
features can be lags, transformations on the lags and date features. You
can also define transformations to apply to the target before fitting,
which will be restored when predicting.
python
from mlforecast import MLForecast
from mlforecast.lag_transforms import ExpandingMean, RollingMean
from mlforecast.target_transforms import Differences
python
fcst = MLForecast(
models=models,
freq='D',
lags=[7, 14],
lag_transforms={
1: [ExpandingMean()],
7: [RollingMean(window_size=28)]
},
date_features=['dayofweek'],
target_transforms=[Differences([1])],
)
Training
To compute the features and train the models call fit on your
Forecast object.
python
fcst.fit(series)
MLForecast(models=[LGBMRegressor, LinearRegression], freq=D, lag_features=['lag7', 'lag14', 'expanding_mean_lag1', 'rolling_mean_lag7_window_size28'], date_features=['dayofweek'], num_threads=1)
Predicting
To get the forecasts for the next n days call predict(n) on the
forecast object. This will automatically handle the updates required by
the features using a recursive strategy.
python
predictions = fcst.predict(14)
predictions
| | uniqueid | ds | LGBMRegressor | LinearRegression | |-----|-----------|------------|---------------|------------------| | 0 | id00 | 2000-04-04 | 299.923771 | 311.432371 | | 1 | id00 | 2000-04-05 | 365.424147 | 379.466214 | | 2 | id00 | 2000-04-06 | 432.562441 | 460.234028 | | 3 | id00 | 2000-04-07 | 495.628000 | 524.278924 | | 4 | id00 | 2000-04-08 | 60.786223 | 79.828767 | | ... | ... | ... | ... | ... | | 275 | id19 | 2000-03-23 | 36.266780 | 28.333215 | | 276 | id19 | 2000-03-24 | 44.370984 | 33.368228 | | 277 | id19 | 2000-03-25 | 50.746222 | 38.613001 | | 278 | id19 | 2000-03-26 | 58.906524 | 43.447398 | | 279 | id_19 | 2000-03-27 | 63.073949 | 48.666783 |
280 rows Γ 4 columns
Visualize results
python
from utilsforecast.plotting import plot_series
python
fig = plot_series(series, predictions, max_ids=4, plot_random=False)

How to contribute
See CONTRIBUTING.md.
Owner
- Name: Nixtla
- Login: Nixtla
- Kind: organization
- Email: ops@nixtla.io
- Location: United States of America
- Website: https://www.nixtla.io/
- Twitter: nixtlainc
- Repositories: 13
- Profile: https://github.com/Nixtla
Open Source Time Series Ecosystem
GitHub Events
Total
- Create event: 35
- Release event: 7
- Issues event: 51
- Watch event: 168
- Delete event: 26
- Member event: 1
- Issue comment event: 139
- Push event: 114
- Pull request review event: 7
- Pull request review comment event: 4
- Pull request event: 55
- Fork event: 15
Last Year
- Create event: 35
- Release event: 7
- Issues event: 51
- Watch event: 168
- Delete event: 26
- Member event: 1
- Issue comment event: 139
- Push event: 114
- Pull request review event: 7
- Pull request review comment event: 4
- Pull request event: 55
- Fork event: 15
Committers
Last synced: almost 3 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| JosΓ© Morales | j****2@g****m | 84 |
| JosΓ© Morales | j****s@g****m | 26 |
| fede | f****z@g****m | 21 |
| capybara | 6****z@u****m | 11 |
| dependabot[bot] | 4****]@u****m | 3 |
| Max Mergenthaler | m****m@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 170
- Total pull requests: 251
- Average time to close issues: 19 days
- Average time to close pull requests: 6 days
- Total issue authors: 114
- Total pull request authors: 17
- Average comments per issue: 2.6
- Average comments per pull request: 1.42
- Merged pull requests: 221
- Bot issues: 0
- Bot pull requests: 27
Past Year
- Issues: 42
- Pull requests: 50
- Average time to close issues: 14 days
- Average time to close pull requests: 3 days
- Issue authors: 36
- Pull request authors: 6
- Average comments per issue: 1.88
- Average comments per pull request: 1.36
- Merged pull requests: 40
- Bot issues: 0
- Bot pull requests: 24
Top Authors
Issue Authors
- jmoralez (14)
- iamyihwa (8)
- kkckk1110 (8)
- pst2154 (4)
- FedericoGarza (4)
- matsuobasho (3)
- ncooder (3)
- adriaanvh1 (3)
- braaannigan (2)
- MrTangsai (2)
- NudnikShpilkis (2)
- DsDev1 (2)
- Sandy4321 (2)
- tblume1992 (2)
- SyedKumailHussainNaqvi (2)
Pull Request Authors
- jmoralez (202)
- dependabot[bot] (39)
- FedericoGarza (32)
- Naren8520 (12)
- adriaanvh1 (4)
- deven367 (3)
- Ammar-Azman (2)
- tblume1992 (2)
- rpmccarter (2)
- tracykteal (2)
- mergenthaler (1)
- MarcoGorelli (1)
- hahnbeelee (1)
- christian-adam (1)
- PierD86 (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v2 composite
- actions/setup-python v2 composite
- mamba-org/provision-with-micromamba main composite
- fastai/workflows/quarto-ghp master composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- pypa/gh-action-pypi-publish master composite
- dask <2023.1.1
- holidays <0.21
- lightgbm
- matplotlib
- nbformat
- numba
- pandas
- pip
- prophet
- pyspark >=3.3
- scikit-learn
- shap
- statsmodels
- window-ops
- xgboost
- lee-dohm/no-response v0.5.0 composite
- release-drafter/release-drafter v5 composite