Recent Releases of mljar-supervised

mljar-supervised - v1.1.17

Fixes matplotlib backend initialization in notebooks after AutoML training #785

- Python
Published by pplonski about 1 year ago

mljar-supervised - v1.1.15

Fixes

  • fixed issues with new sklearn API #789, #788, #787
  • setup matplotlib backend for AutoML and switch it back to original #785

- Python
Published by pplonski over 1 year ago

mljar-supervised - v1.1.12

Fixes

  • a lot of warning and bug fixes done in #777, #771, #762, #761, #760, #759, #758, #757, #756, #755, #754, #753, #752, #751, #750, #749, #743, #742, #733

🍁 Autumn release created thanks to amazing work of @maciekmalachowski, @a-szulc, @Marchlak :rocket: Thank you :1stplacemedal:

- Python
Published by pplonski over 1 year ago

mljar-supervised - v1.1.10

Fix warnings due to packages update.

- Python
Published by pplonski over 1 year ago

mljar-supervised - v1.1.9

Fixes

  • (#380) disable boost on errors step for custom strategy
  • (#728) fix accuracy metric for Lightgbm

- Python
Published by pplonski almost 2 years ago

mljar-supervised - v1.1.7

Fixes

  • (#725) fix styling of AutoML report, apply styles in the mljar-automl-report class

- Python
Published by pplonski about 2 years ago

mljar-supervised - v1.1.6

Fixes

  • fixed problems with report() (#714)

- Python
Published by pplonski about 2 years ago

mljar-supervised - v1.1.5

Fixes

  • fix xgboost warning (#667)

- Python
Published by pplonski about 2 years ago

mljar-supervised - v1.1.4

Fixes

  • fix sklearn/scipy warnings (#709)
  • fix report display in JupyterLab (#710)

- Python
Published by pplonski about 2 years ago

mljar-supervised - v1.1.2

Thanks to @lijm1358 for PR #689, it fixes problems with LightGBM tuning #645, #683.

- Python
Published by pplonski over 2 years ago

mljar-supervised - v1.1.1

I've added custom JSON Encoder that can handle numpy types. It fixes #496, #613, #622, #651.

- Python
Published by pplonski over 2 years ago

mljar-supervised - v1.1.0

Hey there, MLJAR enthusiasts! 🌟 In this release, we're giving a high-five πŸ™Œ to the latest and greatest versions of some rockstar ML packages:

  • 🐼 pandas > 2.0.0
  • πŸš€ xgboost > 2.0.0 (#649)
  • 🌳 dtreeviz > 2.2.2 (#631)
  • 🌈 shap > 0.42.1

🐍 We're supporting Python with versions: 3.8, 3.9, 3.10, 3.11.

Fixes πŸ› οΈ

Alrighty, with great power (read: updates) comes great responsibility (read: fixes)! We've rolled up our sleeves to zap those pesky warnings caused by our major package glow-up:

  • πŸŽ“ Added classes_ for those classy classifiers (#654)
  • πŸ“Š Patched up a boo-boo in the calibration plot (#655)
  • πŸ”§ Tweaked a model type warning that was acting all sassy (#638)

Keep rocking and happy coding! πŸŽΈπŸ€–πŸš€

- Python
Published by pplonski over 2 years ago

mljar-supervised - v1.0.2

Fixes

  • #637 fix problem with font loading for report

- Python
Published by pplonski almost 3 years ago

mljar-supervised - v1.0.1

Fixes

  • #634 fix problem with categorical values in target and nan values for fairness metric
  • #635 add tests for fairness feature
  • #636 switch off shap exceptions printouts

- Python
Published by pplonski almost 3 years ago

mljar-supervised - v1.0.0

We add support for fairness aware training in our AutoML. - For implementation details please check issue #612 - For example usage please check article https://mljar.com/blog/fairness-machine-learning/

- Python
Published by pplonski almost 3 years ago

mljar-supervised - 0.11.5

Bug fixes and updates - #595 replace boston example dataset with California housing dataset, replace mse metric with squared_error for tree based algorithms from sklearn - #596 change the import method for dtreeviz package

- Python
Published by pplonski over 3 years ago

mljar-supervised - 0.11.4

Fixes

  • #590 dynamically set font in a report, thanks @yairVanti!

- Python
Published by pplonski over 3 years ago

mljar-supervised - 0.11.3

Unpin shap version #551

- Python
Published by pplonski almost 4 years ago

mljar-supervised - 0.11.2

Enhancements

  • #523 Add type hints to AutoML class, thank you @DanielR59
  • #519 save train&validation index to file in train/test split, thanks @filipsPL @MaciekEO

Bug fixes

  • #496 fix exception in baseline mode, thanks @DanielR59 @moshe-rl
  • #522 fixed requirements issue, thanks @DanielR59 @MaciekEO
  • #514 remove warning, thanks @MaciekEO
  • #511 disable EDA, thanks @MaciekEO

- Python
Published by pplonski over 4 years ago

mljar-supervised - 0.11.0

Bug fixes

  • #463 change multiprocessing to Parallel with loky
  • #462 handle large data for tree visualization in regression
  • #419 remove/hide warnings
  • #411 loose dependencies for numpy and scipy

- Python
Published by pplonski over 4 years ago

mljar-supervised - 0.10.4

Enhancements

  • #81 add scatter plot predicted vs target in regression
  • #158 add ROC curve for binary classification
  • #336 add visualization for Optuna results
  • #352 add support for Colab
  • #374 update seaborn
  • #378 set golden features number
  • #379 switch off boostonerrors step in Optuna mode
  • #380 add custom cross validation strategy
  • #386 add correlation heatmap
  • #387 add residual plot
  • #389 add feature importance heatmap
  • #390 add custom eval metric
  • #393 update sklearn

Bug fixes

  • #308 fix error in kaggle kernel
  • #353, #355, #366, #368, #376, #382, #383, #384 fixes

Docs

  • #391 add info about hyperparameters optimization methods

Big thank you for help for: @ecoskian, @xuzhang5788, @xiaobo, @RafaD5, @drorhilman, @strelzoff-erdc, @muxuezi, @tresoldi THANK YOU !!!

- Python
Published by pplonski almost 5 years ago

mljar-supervised - 0.10.3

Enhancements

  • #343 set seed in Optuna
  • #344 set eval_metric directly in all algorithms
  • #350 add estimated train time in Optuna mode
  • #342 add optuna_verbose param in AutoML()
  • #354 add KNN in Optuna
  • #356 and Neural Network in Optuna
  • #357, #348 use mljar wrapper for Random Forest and Extra Trees
  • #358 add extra_tree param in LightGBM
  • #359 switch off feature engineering in Optuna mode - only highly tuned models are produced
  • #361 list all eval_metric in error message
  • #362 add accuracy eval_metric
  • #340 support for r2

Bug fixes

  • #347 dont include Optuna tuning time in total_time_limit
  • #360 missing auc scores for training in CatBoost

- Python
Published by pplonski about 5 years ago

mljar-supervised - 0.10.2

Add support to Python 3.9 (#339) Thanks to @rterbush!

- Python
Published by pplonski about 5 years ago

mljar-supervised - 0.10.1

Enhancements

  • #332 We added Optuna framework for hyperparameters tuning. It can be used by setting mode="Optuna" in AutoML. You can read more details at blog post: https://mljar.com/blog/automl-optuna/

- Python
Published by pplonski about 5 years ago

mljar-supervised - 0.9.1

Enhancements

  • #179 add need_retrain() method to detect performance decrease
  • #226 extract rules from decision tree
  • #310 add support for MAPE
  • #312 optimize prediction time
  • #313 set stacking time threshold depending on best model train time
  • #320 search for model with prediction time constraint
  • #322 n_jobs as a parameter
  • #328 disable stacking for small (nrows < 500) datasets

Bug fixes

  • #214 move directory after training
  • #246 raise exception when small time limit and no models are trained
  • #247 proper display for optimize AUC and R2
  • #306 add mix_encoding argument in AutoML constructor
  • #308 fix dependencies error in kaggle notebook
  • #314 bug fix in hill climbing in Perform mode
  • #323 fix catboost bug with tree limit
  • #324 #325 bug for feature importance for small data

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.8.8

Many small improvements.

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.8.4

A lot of small tweaks and improvements :)

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.8.0

Enhancements

  • #300 Add step with k-means additional features
  • #299 Add Boost On Errors step
  • #154 Sample weight available
  • #229 Sort leaderboard (disabled for now for debug purposes)

Bug fixes

  • #301 Fix storing unique keys in mljar tuner only for trained models
  • #275 #248 small fixes

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.7.19

Bug fixes

  • #293 Typo in is_scale_needed
  • #277 Fix problem with unit data
  • #285 Restricted characters in LightGBM

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.7.18

Bug fixes

  • #292 Remove unused params from CatBoost

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.7.17

Enhancements

  • #291 Disable loo encoding
  • #290 improve ordering in hill climbing
  • #287 replace mixencoding with integetencoding

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.7.16

Bug fixes

  • #283 Don use Random Feature model

Enhancements

  • #284 Check time for features selection
  • #286 Add R2 score
  • #288 Improve algorithms order in notsorandom step

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.7.15

Enhancements

  • #274 limit number of iteration in CatBoost

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.7.13

Enhancements

  • #92 add time checks
  • #270 disable stacking for validation type split and repeats > 1
  • #271 disable ldb model sort

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.7.12

Enhancements

  • #223 Support for repeated validation
  • #266 Adjust validation for small datasets

Bug fixes

  • #265 fix validation warning
  • #264 fix EDA tests
  • #261 better error message for missing golden features

Dependencies

  • #260 update fastparquet to 0.4.1

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.7.11

Bug fixes

  • #258 Fix cant load automl when adjusted validation is used

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.7.10

Enhancements

  • #250 New strategies for categorical encoding
  • #257 Control algorithm order in not-so-random step

Bug fixes

  • #255 Fix overwrite in adjusted models

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.7.9

Enhancements

  • #249 Adjust validation type in Compete mode

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.7.8

Enhancements

  • #249 Adjust validation type based on data
  • #251 add more eval_metrics in regression
  • #252 add traceback to error reports

Bug fixes

  • #253 Fix error when text data has missing values in test fold

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.7.7

Enhancements

  • #73 Optimize AUC

Bug fixes

  • #136 RMSE in Extra Trees and Random Forest
  • #243 Switch off Xgboost and CatBoost for multiclass with many classes (in extreme switch of Extra Trees and Random Forest)
  • #245 Fix ordering of prediction columns

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.7.6

Enhancements

  • #240 Change algorithm execution order for default algorithms

Bug fixes:

  • #236 Wrong labels for target predictions in the case of -1, 1 target
  • #238 Object of type float32 is not JSON serializable
  • #239 Value Error: Input contains NaN in numpy training array

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.7.5

Bug fixes

  • (#216) Raise exception when all models with error
  • (#234) Fix target with first empty value

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.7.4

Enhancements

  • #184 Change Keras+TF Neural Networks to scikit-learn MLP
  • #233 Limit staking number of classes and models
  • #232 Remove Linear model from Compete mode
  • #208 Improve importance computation for large number of columns
  • #205 Remove small learning rates for Xgboost

Bug fixes:

  • #231 Restricted characters in feature_neams in Xgboost
  • #227 Fix strings in golden_features.json - thank you @SuryaThiru!
  • #215 Assure at least 20 samples (or k_folds) for each class

Docs update:

  • #213 Update docs in AutoML - thank you @shahules786!

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.7.3

New features :sparkles:

  • #176 extended EDA - thanks to @shahules786

Bug fixes :bug:

  • #201 error in golden features sampling
  • #199 bug for float multi-class labels
  • #196 add exception for empty data
  • #195 set threshold for accuracy metric instead f1
  • #194 ensemble should be best model if has more than 1 model
  • #193 fixed predict aflter model loading
  • #192 update pyarrow
  • #191 hide shap warnings
  • #190 fix in preprocessing
  • #188 fix type in feature selection - thanks to @uditswaroopa

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.7.2

Bug fixes :bug:

  • #187 fix wrong order in golden features step
  • #186 fix _get_results_path
  • #185 fix models loading
  • #184 exception when drop all features during selection
  • #182 catch exceptions from model and log to errors.md
  • #181 remove forbidden characters in EDA
  • #177 change docstring to google-stype
  • #175 remove tuning_mode parameter from AutoML

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.7.1

Bug fixes :bug:

  • #173 fix bug in shap sampling
  • #174 update dtreeviz package

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.7.0

Improvements

  • (#148) make AutoML scikit-learn compatible, thank you @spamz23! :clap: :clap: :clap:
  • (#170, #171 ) improve printouts while training AutoML

- Python
Published by pplonski over 5 years ago

mljar-supervised - 0.6.1

Enhancements

  • #145 Add EDA for input data set #125
  • #135 Add ability to pause and restore the training
  • #19 Add tests for ensemble save and load

Refactor

  • #149 Add Time Controller
  • #80 add tests for one column input

Bug fixes

  • #144 AutoMlException
  • #142 Error when training NN on BNP Paribas kaggle dataset

- Python
Published by pplonski almost 6 years ago

mljar-supervised - 0.6.0

  • Add golden features transformer (#126)
  • Add feature selection (#133)
  • Add one-hot encoding (#76)
  • Fixes in Neural Networks (#131, #129)
  • Add max_depth in Extra Trees and Random Forest (#106)
  • Add support for date/time features (#122)

- Python
Published by pplonski almost 6 years ago

mljar-supervised - 0.5.5

  • Speed-up CatBoost by removing very small learning rates
  • Set larger tolerance in Logistic Regression, from 1e-4 to 5e-4

- Python
Published by pplonski almost 6 years ago

mljar-supervised - 0.5.4

  • #129 fix NN bug
  • #128 add support for text features
  • #122 add support for date/time features
  • #106 tune max_depth in Random Fores and Extra Trees

- Python
Published by pplonski almost 6 years ago

mljar-supervised - 0.5.3

(#122) Add support for date/time features

- Python
Published by pplonski almost 6 years ago

mljar-supervised - 0.5.2

Clip values in log-normal scale transform

- Python
Published by pplonski almost 6 years ago

mljar-supervised - 0.5.1

disable shap for NN and CatBoost (#112, #114)

- Python
Published by pplonski almost 6 years ago

mljar-supervised - 0.5.0

100 Add modes: Explain, Perform, Compete

108 Faster NN implementation for small datasets

110 Fix warnings in permutation importance

- Python
Published by pplonski almost 6 years ago

mljar-supervised - 0.4.1

Minor fixes to 0.4.0

- Python
Published by pplonski almost 6 years ago

mljar-supervised - 0.4.0

Enhancements

  • #96 Use internal early stopping
  • #95 convergence warning in linear algorithm
  • #94 Add support for kNN
  • #90 Try to provide more meaningful names to models
  • #74 Add stacked models
  • #72 Add support for Neural Networks
  • #70 Select best model after each iteration
  • #64 Generate data information once
  • #50 Add validation with split

Bugs

  • #91 Dont run preprocessing for baseline algorithm
  • #85 Cant load CatBoost model for predictions

- Python
Published by pplonski almost 6 years ago

mljar-supervised - 0.3.5

Removed mae from sklearn decision tree based algorithms because they slow down the training

- Python
Published by pplonski about 6 years ago

mljar-supervised - 0.3.0

enhancement - #83 Compare all models visually - #79 Aggregate importances to one plot - #78 Sort linear model coefficients - #77 Shuffle generated models - #75 Add SHAP explanations to models - #69 Add skip_interpret argument - #61 Add ExtraTrees - #52 Add Linear and Logistic Regression support - #51 Add LightGBM support - #27 feature importance [enhancement] - #68 Add minimum number of steps for algorithm

bug - #66 Doesnt work with bytes as class - #65 Dont generate parameters for Baseline - #63 wrong time estimation for model training - #62 Hill climbing not iterating on all algorithms - #60 Show number of trees in learning curves

refactor - #47 Refactor AutoML additional_metrics method

- Python
Published by pplonski about 6 years ago

mljar-supervised - v0.1.6

20 Add preprocessing for new data in case of missing values not present in train data.

- Python
Published by pplonski about 7 years ago

mljar-supervised - v0.1.5

fix tqdm on jupyter

- Python
Published by pplonski about 7 years ago

mljar-supervised - v0.1.4

Add missing requirements in setup.py

- Python
Published by pplonski about 7 years ago

mljar-supervised - v0.1.3

  • set metric to be optimized (#17)
  • create table with model details (#8)
  • progress bar for training (#9)
  • add reproducibility tests (#5)
  • callback to control number of iterations (#11)
  • fixed: set path for catboost snapshot (#16)
  • learning curves (#14)

- Python
Published by pplonski about 7 years ago

mljar-supervised - Predict labels

The autoML predicts categorical labels as addition to probabilities. There is an optimal threshold computed for the best model which maximize F1 score.

The predicted data frame right now looks like this: p_0, p_1, label 0.1, 0.9, 1 0.1, 0.9, 1 0.9, 0.1, 0 ... The p_0 is probability for class 0. The p_1 is probability for class 1. The 'label' column is the prediction label decided based on threshold.

In case in target columns there are other values than 0 and 1, then they will be internally converted to 0, 1 but in predicted data frame they will appear in columns. For example if there are A and B values in a target column, then the predicted data frame will look like: p_A, p_B, label 0.1, 0.9, B 0.1, 0.9, B 0.9, 0.1, A

- Python
Published by pplonski about 7 years ago

mljar-supervised - The first release

The AutoML solution that can solve binary classification tasks with respect to LogLoss metric. There are used following algorithms: - Random Forest - CatBoost - LightGBM - Xgboost - Neural Networks

- Python
Published by pplonski about 7 years ago