mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation

https://github.com/mljar/mljar-supervised

Keywords

automated-machine-learning automl automl-api automl-python catboost data-science decision-tree ensemble feature-engineering hyper-parameters hyperparameter-optimization lightgbm machine-learning mljar neural-network random-forest scikit-learn xgboost

Last synced: 10 months ago · JSON representation ·

Repository

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation

Basic Info

Host: GitHub
Owner: mljar
License: mit
Language: Python
Default Branch: master
Homepage: https://mljar.com
Size: 9.34 MB

Statistics

Stars: 3,193
Watchers: 53
Forks: 423
Open Issues: 144
Releases: 65

Topics

automated-machine-learning automl automl-api automl-python catboost data-science decision-tree ensemble feature-engineering hyper-parameters hyperparameter-optimization lightgbm machine-learning mljar neural-network random-forest scikit-learn xgboost

Created over 7 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

README.md

MLJAR Automated Machine Learning for Humans

mljar AutoML

Documentation: https://supervised.mljar.com/

Source Code: https://github.com/mljar/mljar-supervised

Looking for commercial support: Please contact us by email for details

Watch full AutoML training in Python under 2 minutes. The training is done in MLJAR Studio.

Automated Machine Learning

The mljar-supervised is an Automated Machine Learning Python package that works with tabular data. It is designed to save time for a data scientist. It abstracts the common way to preprocess the data, construct the machine learning models, and perform hyper-parameters tuning to find the best model :trophy:. It is no black box, as you can see exactly how the ML pipeline is constructed (with a detailed Markdown report for each ML model).

The mljar-supervised will help you with: - explaining and understanding your data (Automatic Exploratory Data Analysis), - trying many different machine learning models (Algorithm Selection and Hyper-Parameters tuning), - creating Markdown reports from analysis with details about all models (Automatic-Documentation), - saving, re-running, and loading the analysis and ML models.

It has four built-in modes of work: - Explain mode, which is ideal for explaining and understanding the data, with many data explanations, like decision trees visualization, linear models coefficients display, permutation importance, and SHAP explanations of data, - Perform for building ML pipelines to use in production, - Compete mode that trains highly-tuned ML models with ensembling and stacking, with the purpose to use in ML competitions. - Optuna mode can be used to search for highly-tuned ML models should be used when the performance is the most important, and computation time is not limited (it is available from version 0.10.0)

Of course, you can further customize the details of each mode to meet the requirements.

What's good in it?

It uses many algorithms: Baseline, Linear, Random Forest, Extra Trees, LightGBM, Xgboost, CatBoost, Neural Networks, and Nearest Neighbors.
It can compute Ensemble based on a greedy algorithm from Caruana paper.
It can stack models to build a level 2 ensemble (available in Compete mode or after setting the stack_models parameter).
It can do features preprocessing, like missing values imputation and converting categoricals. What is more, it can also handle target values preprocessing.
It can do advanced features engineering, like Golden Features, Features Selection, Text and Time Transformations.
It can tune hyper-parameters with a not-so-random-search algorithm (random-search over a defined set of values) and hill climbing to fine-tune final models.
It can compute the Baseline for your data so that you will know if you need Machine Learning or not!
It has extensive explanations. This package is training simple Decision Trees with max_depth <= 5, so you can easily visualize them with amazing dtreeviz to better understand your data.
The mljar-supervised uses simple linear regression and includes its coefficients in the summary report, so you can check which features are used the most in the linear model.
It cares about the explainability of models: for every algorithm, the feature importance is computed based on permutation. Additionally, for every algorithm, the SHAP explanations are computed: feature importance, dependence plots, and decision plots (explanations can be switched off with the explain_level parameter).
There is automatic documentation for every ML experiment run with AutoML. The mljar-supervised creates markdown reports from AutoML training full of ML details, metrics, and charts.

AutoML Web App with User Interface

We created a Web App with GUI, so you don't need to write any code 🐍. Just upload your data. Please check the Web App at github.com/mljar/automl-app. You can run this Web App locally on your computer, so your data is safe and secure :cat:

Automatic Documentation

The AutoML Report

The report from running AutoML will contain the table with information about each model score and the time needed to train the model. There is a link for each model, which you can click to see the model's details. The performance of all ML models is presented as scatter and box plots so you can visually inspect which algorithms perform the best :trophy:.

AutoML leaderboard

The `Decision Tree` Report

The example for Decision Tree summary with trees visualization. For classification tasks, additional metrics are provided: - confusion matrix - threshold (optimized in the case of binary classification task) - F1 score - Accuracy - Precision, Recall, MCC

Decision Tree summary

The `LightGBM` Report

The example for LightGBM summary:

Decision Tree summary

Available Modes

In the docs you can find details about AutoML modes that are presented in the table.

Explain

py automl = AutoML(mode="Explain")

It is aimed to be used when the user wants to explain and understand the data. - It is using 75%/25% train/test split. - It uses: Baseline, Linear, Decision Tree, Random Forest, Xgboost, `Neural Network' algorithms, and ensemble. - It has full explanations: learning curves, importance plots, and SHAP plots.

Perform

py automl = AutoML(mode="Perform")

It should be used when the user wants to train a model that will be used in real-life use cases. - It uses a 5-fold CV. - It uses: Linear, Random Forest, LightGBM, Xgboost, CatBoost, and Neural Network. It uses ensembling. - It has learning curves and importance plots in reports.

Compete

py automl = AutoML(mode="Compete")

It should be used for machine learning competitions. - It adapts the validation strategy depending on dataset size and total_time_limit. It can be: a train/test split (80/20), 5-fold CV or 10-fold CV. - It is using: Linear, Decision Tree, Random Forest, Extra Trees, LightGBM, Xgboost, CatBoost, Neural Network, and Nearest Neighbors. It uses ensemble and stacking. - It has only learning curves in the reports.

Optuna

py automl = AutoML(mode="Optuna", optuna_time_budget=3600)

It should be used when the performance is the most important and time is not limited. - It uses a 10-fold CV - It uses: Random Forest, Extra Trees, LightGBM, Xgboost, and CatBoost. Those algorithms are tuned by Optuna framework for optuna_time_budget seconds, each. Algorithms are tuned with original data, without advanced feature engineering. - It uses advanced feature engineering, stacking and ensembling. The hyperparameters found for original data are reused with those steps. - It produces learning curves in the reports.

How to save and load AutoML?

All models in the AutoML are saved and loaded automatically. No need to call save() or load().

Example:

Train AutoML

python automl = AutoML(results_path="AutoML_classifier") automl.fit(X, y)

You will have all models saved in the AutoML_classifier directory. Each model will have a separate directory with the README.md file with all details from the training.

Compute predictions

python automl = AutoML(results_path="AutoML_classifier") automl.predict(X)

The AutoML automatically loads models from the results_path directory. If you will call fit() on already trained AutoML then you will get a warning message that AutoML is already fitted.

Why do you automatically save all models?

All models are automatically saved to be able to restore the training after interruption. For example, you are training AutoML for 48 hours, and after 47 hours, there is some unexpected interruption. In MLJAR AutoML you just call the same training code after the interruption and AutoML reloads already trained models and finishes the training.

Supported evaluation metrics (`eval_metric` argument in `AutoML()`)

for binary classification: logloss, auc, f1, average_precision, accuracy- default is logloss
for multiclass classification: logloss, f1, accuracy - default is logloss
for regression: rmse, mse, mae, r2, mape, spearman, pearson - default is rmse

If you don't find the eval_metric that you need, please add a new issue. We will add it.

Fairness Aware Training

Starting from version 1.0.0 AutoML can optimize the Machine Learning pipeline with sensitive features. There are the following fairness related arguments in the AutoML constructor: - fairness_metric - metric which will be used to decide if the model is fair, - fairness_threshold - threshold used in decision about model fairness, - privileged_groups - privileged groups used in fairness metrics computation, - underprivileged_groups - underprivileged groups used in fairness metrics computation.

The fit() method accepts sensitive_features. When sensitive features are passed to AutoML, the best model will be selected among fair models only. In the AutoML reports, additional information about fairness metrics will be added. The MLJAR AutoML supports two methods for bias mitigation: - Sample Weighting - assigns weights to samples to treat samples equally, - Smart Grid Search - similar to Sample Weighting, where different weights are checked to optimize fairness metric.

The fair ML building can be used with all algorithms, including Ensemble and Stacked Ensemble. We support three Machine Learning tasks: - binary classification, - mutliclass classification, - regression.

Example code:

```python from sklearn.modelselection import traintestsplit from sklearn.datasets import fetchopenml from supervised.automl import AutoML

data = fetchopenml(dataid=1590, asframe=True) X = data.data y = (data.target == ">50K") * 1 sensitivefeatures = X[["sex"]]

Xtrain, Xtest, ytrain, ytest, Strain, Stest = traintestsplit( X, y, sensitivefeatures, stratify=y, testsize=0.75, random_state=42 )

automl = AutoML( algorithms=[ "Xgboost" ], trainensemble=False, fairnessmetric="demographicparityratio",
fairnessthreshold=0.8, privilegedgroups = [{"sex": "Male"}], underprivileged_groups = [{"sex": "Female"}], )

automl.fit(Xtrain, ytrain, sensitivefeatures=Strain) ```

You can read more about fairness aware AutoML training in our article https://mljar.com/blog/fairness-machine-learning/

Fairness aware AutoML

Examples

:point_right: Binary Classification Example

There is a simple interface available with fit and predict methods.

```python import pandas as pd from sklearn.modelselection import traintest_split from supervised.automl import AutoML

df = pd.readcsv( "https://raw.githubusercontent.com/pplonski/datasets-for-start/master/adult/data.csv", skipinitialspace=True, ) Xtrain, Xtest, ytrain, ytest = traintestsplit( df[df.columns[:-1]], df["income"], testsize=0.25 )

automl = AutoML() automl.fit(Xtrain, ytrain)

predictions = automl.predict(X_test) ```

AutoML fit will print: py Create directory AutoML_1 AutoML task to be solved: binary_classification AutoML will use algorithms: ['Baseline', 'Linear', 'Decision Tree', 'Random Forest', 'Xgboost', 'Neural Network'] AutoML will optimize for metric: logloss 1_Baseline final logloss 0.5519845471086654 time 0.08 seconds 2_DecisionTree final logloss 0.3655910192804364 time 10.28 seconds 3_Linear final logloss 0.38139916864708445 time 3.19 seconds 4_Default_RandomForest final logloss 0.2975204390214936 time 79.19 seconds 5_Default_Xgboost final logloss 0.2731086827200411 time 5.17 seconds 6_Default_NeuralNetwork final logloss 0.319812276905242 time 21.19 seconds Ensemble final logloss 0.2731086821194617 time 1.43 seconds

the AutoML results in Markdown report
the Xgboost Markdown report, please take a look at amazing dependence plots produced by SHAP package :sparkling_heart:
the Decision Tree Markdown report, please take a look at beautiful tree visualization :sparkles:
the Logistic Regression Markdown report, please take a look at coefficients table, and you can compare the SHAP plots between (Xgboost, Decision Tree and Logistic Regression) :coffee:

:point_right: Multi-Class Classification Example

The example code for classification of the optical recognition of handwritten digits dataset. Running this code in less than 30 minutes will result in test accuracy ~98%.

```python import pandas as pd

scikit learn utilites

from sklearn.datasets import loaddigits from sklearn.metrics import accuracyscore from sklearn.modelselection import traintest_split

mljar-supervised package

from supervised.automl import AutoML

load the data

digits = loaddigits() Xtrain, Xtest, ytrain, ytest = traintestsplit( pd.DataFrame(digits.data), digits.target, stratify=digits.target, testsize=0.25, random_state=123 )

train models with AutoML

automl = AutoML(mode="Perform") automl.fit(Xtrain, ytrain)

compute the accuracy on test data

predictions = automl.predictall(Xtest) print(predictions.head()) print("Test accuracy:", accuracyscore(ytest, predictions["label"].astype(int))) ```

:point_right: Regression Example

Regression example on California Housing house prices data.

```python import numpy as np import pandas as pd from sklearn.datasets import fetchcaliforniahousing from sklearn.modelselection import traintestsplit from sklearn.metrics import meansquared_error from supervised.automl import AutoML # mljar-supervised

Load the data

housing = fetchcaliforniahousing() Xtrain, Xtest, ytrain, ytest = traintestsplit( pd.DataFrame(housing.data, columns=housing.featurenames), housing.target, testsize=0.25, random_state=123, )

train models with AutoML

automl = AutoML(mode="Explain") automl.fit(Xtrain, ytrain)

compute the MSE on test data

predictions = automl.predict(Xtest) print("Test MSE:", meansquarederror(ytest, predictions)) ```

:point_right: More Examples

Income classification - it is a binary classification task on census data
Iris classification - it is a multiclass classification on Iris flowers data
House price regression - it is a regression task on Boston houses data

FAQ

What method is used for hyperparameters optimization?

- For modes: `Explain`, `Perform`, and `Compete` there is used a random search method combined with hill climbing. In this approach, all checked models are saved and used for building Ensemble. - For mode: `Optuna` the Optuna framework is used. It uses using TPE sampler for tuning. Models checked during the Optuna hyperparameters search are not saved, only the best model is saved (the final model from tuning). You can check the details about checked hyperparameters from optuna by checking study files in the `optuna` directory in your AutoML `results_path`.

How to save and load AutoML?

The save and load of AutoML models is automatic. All models created during AutoML training are saved in the directory set in `results_path` (argument of `AutoML()` constructor). If there is no `results_path` set, then the directory is created based on following name convention: `AutoML_{number}` the `number` will be number from 1 to 1000 (depends which directory name will be free). Example save and load: ```python automl = AutoML(results_path='AutoML_1') automl.fit(X, y) ``` The all models from AutoML are saved in `AutoML_1` directory. To load models: ```python automl = AutoML(results_path='AutoML_1') automl.predict(X) ```

How to set ML task (select between classification or regression)?

The MLJAR AutoML can work with: - binary classification - multi-class classification - regression The ML task detection is automatic based on target values. There can be situation if you want to manually force AutoML to select the ML task, then you need to set `ml_task` parameter. It can be set to `'binary_classification'`, `'multiclass_classification'`, `'regression'`. Example: ```python automl = AutoML(ml_task='regression') automl.fit(X, y) ``` In the above example the regression model will be fitted.

How to reuse Optuna hyperparameters?

You can reuse Optuna hyperparameters that were found in other AutoML training. You need to pass them in `optuna_init_params` argument. All hyperparameters found during Optuna tuning are saved in the `optuna/optuna.json` file (inside `results_path` directory). Example: ```python optuna_init = json.loads(open('previous_AutoML_training/optuna/optuna.json').read()) automl = AutoML( mode='Optuna', optuna_init_params=optuna_init ) automl.fit(X, y) ``` When reusing Optuna hyperparameters the Optuna tuning is simply skipped. The model will be trained with hyperparameters set in `optuna_init_params`. Right now there is no option to continue Optuna tuning with seed parameters.

How to know the order of classes for binary or multiclass problem when using predict_proba?

To get predicted probabilites with information about class label please use the `predict_all()` method. It returns the pandas DataFrame with class names in the columns. The order of predicted columns is the same in the `predict_proba()` and `predict_all()` methods. The `predict_all()` method will additionaly have the column with the predicted class label.

Documentation

For details please check mljar-supervised docs.

Installation

From PyPi repository:

pip install mljar-supervised

To install this package with conda run: conda install -c conda-forge mljar-supervised

From source code:

git clone https://github.com/mljar/mljar-supervised.git cd mljar-supervised python setup.py install

Installation for development git clone https://github.com/mljar/mljar-supervised.git virtualenv venv --python=python3.6 source venv/bin/activate pip install -r requirements.txt pip install -r requirements_dev.txt

Running in the docker: FROM python:3.7-slim-buster RUN apt-get update && apt-get -y update RUN apt-get install -y build-essential python3-pip python3-dev RUN pip3 -q install pip --upgrade RUN pip3 install mljar-supervised jupyter CMD ["jupyter", "notebook", "--port=8888", "--no-browser", "--ip=0.0.0.0", "--allow-root"]

Install from GitHub with pip: pip install -q -U git+https://github.com/mljar/mljar-supervised.git@master

Demo

In the below demo GIF you will see: - MLJAR AutoML trained in Jupyter Notebook on the Titanic dataset - overview of created files - a showcase of selected plots created during AutoML training - algorithm comparison report along with their plots - example of README file and CSV file with results

Contributing

To get started take a look at our Contribution Guide for information about our process and where you can fit in!

Contributors

Cite

Would you like to cite MLJAR? Great! :)

You can cite MLJAR as follows:

@misc{mljar, author = {Aleksandra P\l{}o\'{n}ska and Piotr P\l{}o\'{n}ski}, year = {2021}, publisher = {MLJAR}, address = {\L{}apy, Poland}, title = {MLJAR: State-of-the-art Automated Machine Learning Framework for Tabular Data. Version 0.10.3}, url = {https://github.com/mljar/mljar-supervised} }

Would love to hear from you about how have you used MLJAR AutoML in your project. Please feel free to let us know at

License

The mljar-supervised is provided with MIT license.

Commercial support

Looking for commercial support? Do you need new feature implementation? Please contact us by email for details.

MLJAR

The mljar-supervised is an open-source project created by MLJAR. We care about ease of use in Machine Learning. The mljar.com provides a beautiful and simple user interface for building machine learning models.

Owner

Name: MLJAR
Login: mljar
Kind: organization
Email: contact@mljar.com
Location: Poland

Website: https://mljar.com
Twitter: MLJARofficial
Repositories: 30
Profile: https://github.com/mljar

Outstanding Data Science Tools

Citation (CITATION)

@misc{mljar,
  author    = {Aleksandra P\l{}o\'{n}ska and Piotr P\l{}o\'{n}ski},
  year      = {2021},
  publisher = {MLJAR Sp. z o.o.},
  address   = {\L{}apy, Poland},
  title     = {MLJAR: State-of-the-art Automated Machine Learning Framework for Tabular Data.  Version 0.10.3},
  url       = {https://github.com/mljar/mljar-supervised}
}

GitHub Events

Total

Create event: 3
Release event: 3
Issues event: 26
Watch event: 162
Issue comment event: 37
Push event: 14
Pull request event: 3
Fork event: 18

Last Year

Create event: 3
Release event: 3
Issues event: 26
Watch event: 162
Issue comment event: 37
Push event: 14
Pull request event: 3
Fork event: 18

Committers

Last synced: about 1 year ago

All Time

Total Commits: 1,131
Total Committers: 30
Avg Commits per committer: 37.7
Development Distribution Score (DDS): 0.168

Past Year

Commits: 42
Committers: 5
Avg Commits per committer: 8.4
Development Distribution Score (DDS): 0.405

Top Committers

Name	Email	Commits
Piotrek	p**6@g**m	941
Shahul E S	s**6@g**m	60
spamz23	d**0@h**m	46
a-szulc	a**7@g**m	10
abtheo	c**t@g**m	10
Aleksandra	6****a	8
MaciekEO	m**9@g**m	6
DanielAvdar	6****r	6
Neil Mehta	n**1@g**m	5
Udit Swaroopa	6****a	5
Surya	s**1@g**m	4
Maciek	m**1@g**m	4
DanielR59	d**9@h**m	3
Marchlak	b**1@w**l	3
Hk669	h**9@g**m	2
Taeyoon Kim	p****a	2
adrianblazeusz	a**z@g**m	2
makoeppel	k**a@g**m	2
Aakarsh	3****1	1
Aaron Schumacher	a**r@g**m	1
Ajay Muralidharan	1****1	1
JongMok Lee	l**8@g**m	1
Quentin Fortier	q**r@g**m	1
Rafael Sanabria	r**8@g**m	1
Zacchaeus	6****4	1
adrienpacifico	a**o@g**m	1
molspace	n**v@g**m	1
yairVanti	y**r@v**i	1
Harsh Poddar	h**r@q**m	1
谷粒	k**g@f**m	1

Committer Domains (Top 20 + Academic)

foxmail.com: 1 quantiphi.com: 1 vanti.ai: 1 wp.pl: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 242
Total pull requests: 50
Average time to close issues: 5 months
Average time to close pull requests: about 2 months
Total issue authors: 131
Total pull request authors: 25
Average comments per issue: 3.94
Average comments per pull request: 1.26
Merged pull requests: 24
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 22
Pull requests: 7
Average time to close issues: 28 days
Average time to close pull requests: about 11 hours
Issue authors: 15
Pull request authors: 4
Average comments per issue: 1.68
Average comments per pull request: 0.57
Merged pull requests: 5
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

pplonski (39)
a-szulc (14)
yairVanti (7)
Karlheinzniebuhr (7)
strukevych (7)
maciekmalachowski (5)
Tonywhitemin (4)
adrienpacifico (4)
williamty (4)
Selphie14100 (3)
AkshayNovacene (3)
aplonska (3)
off99555 (3)
dsimop (3)
KarthikDutt (3)

Pull Request Authors

a-szulc (20)
DanielAvdar (11)
maciekmalachowski (7)
Marchlak (6)
adrianblazeusz (3)
namelessperson0 (3)
Kshitij68 (2)
Hk669 (2)
ajaymur91 (2)
andyrosa2 (2)
wchaoyi (2)
makoeppel (2)
lijm1358 (2)
yairVanti (2)
JAroyan (2)

Top Labels

Issue Labels

help wanted (41) bug (37) enhancement (22) good first issue (20) docs (4) dependencies (3) tests (1) installation (1) performance (1) future (1)

Pull Request Labels

Packages

Total packages: 3
Total downloads:
- pypi 5,665 last-month
Total docker downloads: 94

Total dependent packages: 0
(may contain duplicates)
Total dependent repositories: 11
(may contain duplicates)
Total versions: 134
Total maintainers: 1

pypi.org: mljar-supervised

Automated Machine Learning for Humans

Homepage: https://github.com/mljar/mljar-supervised
Documentation: https://mljar-supervised.readthedocs.io/
License: MIT
Latest release: 1.1.18
published about 1 year ago

Versions: 101
Dependent Packages: 0
Dependent Repositories: 11
Downloads: 5,665 Last month
Docker Downloads: 94

Rankings

Stargazers count: 1.4%

Forks count: 2.8%

Downloads: 3.5%

Dependent repos count: 4.4%

Average: 4.4%

Docker downloads count: 4.6%

Dependent packages count: 10.1%

Maintainers (1)

pplonski

Last synced: 10 months ago

proxy.golang.org: github.com/mljar/mljar-supervised

Documentation: https://pkg.go.dev/github.com/mljar/mljar-supervised#section-documentation
License: mit
Latest release: v1.1.18
published about 1 year ago

Versions: 28
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 5.6%

Average: 5.8%

Dependent repos count: 6.0%

Last synced: 10 months ago

conda-forge.org: mljar-supervised

The mljar-supervised is an Automated Machine Learning Python package that works with tabular data. It is designed to save time for a data scientist. It abstracts the common way to preprocess the data, construct the machine learning models, and perform hyper-parameters tuning to find the best model trophy. It is no black-box as you can see exactly how the ML pipeline is constructed (with a detailed Markdown report for each ML model).

Homepage: https://github.com/mljar/mljar-supervised
License: MIT
Latest release: 0.11.3
published almost 4 years ago

Versions: 5
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Stargazers count: 7.6%

Forks count: 8.9%

Average: 25.4%

Dependent repos count: 34.0%

Dependent packages count: 51.2%

Last synced: 10 months ago

mljar-supervised

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

MLJAR Automated Machine Learning for Humans

Table of Contents

Automated Machine Learning

What's good in it?

AutoML Web App with User Interface

Automatic Documentation

The AutoML Report

The Decision Tree Report

The LightGBM Report

Available Modes

Explain

Perform

Compete

Optuna

How to save and load AutoML?

Example:

Train AutoML

Compute predictions

Why do you automatically save all models?

Supported evaluation metrics (eval_metric argument in AutoML())

Fairness Aware Training

Examples

:point_right: Binary Classification Example

:point_right: Multi-Class Classification Example

scikit learn utilites

mljar-supervised package

load the data

train models with AutoML

compute the accuracy on test data

:point_right: Regression Example

Load the data

train models with AutoML

compute the MSE on test data

:point_right: More Examples

FAQ

Documentation

Installation

Demo

Contributing

Contributors

Cite

License

Commercial support

MLJAR

Owner

Citation (CITATION)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: mljar-supervised

Rankings

Maintainers (1)

proxy.golang.org: github.com/mljar/mljar-supervised

Rankings

conda-forge.org: mljar-supervised

Rankings

The `Decision Tree` Report

The `LightGBM` Report

Supported evaluation metrics (`eval_metric` argument in `AutoML()`)