skltest

Python library to compare the performance obtained by models available in Scikit-learn to solve regression and classification problems. It is possible to evaluate the influence of successive resampling and optimize the hyperparameters through a K-fold cross-validation holdout.

https://github.com/matheus-hoffmann/skltest

Keywords

machine-learning python

Last synced: 6 months ago · JSON representation ·

Repository

Python library to compare the performance obtained by models available in Scikit-learn to solve regression and classification problems. It is possible to evaluate the influence of successive resampling and optimize the hyperparameters through a K-fold cross-validation holdout.

Basic Info

Host: GitHub
Owner: matheus-hoffmann
Language: Python
Default Branch: main
Homepage:
Size: 127 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 2
Releases: 1

Topics

machine-learning python

Created over 4 years ago · Last pushed over 3 years ago

Metadata Files

Readme Citation

README.md

Skl Test

Python library to compare more than 30 regression models available in Scikit-learn at once. It is possible to evaluate the influence of successive resampling and optimize the hyperparameters through K-fold cross-validation holdout.

Instalation

This installation tutorial will guide you through running a local application using the conda environment and the PyCharm as IDE. First, download the full repository as a ZIP file and extract to a folder named sklregressortest.

After that, open a conda terminal e follow these steps: 1. Create your own virtual environment with the correct python version:

bash conda create -n skl_regressor_test python=3.8

Activate your virtual environment in order to work in this safe environment:

bash conda activate skl_regressor_test

Navigate to the setup.py folder in your terminal:

bash cd [PATH]/skl_regressor_test

Install the library:

bash pip install -e .

Configuring PyCharm

In the same conda command window, get the path where the lib got installed:

bash conda env list

Search for the skl_regressor_test env in the list, and copy the full path.

Open PyCharm IDE follow this steps: File -> Settings -> Project -> Project Interpreter ou Python Interpreter.

After that, select the virtual environment name (skl_regressor_test) and apply.

First steps

Read your csv data file:

python df, input_data, output_data = read_data("80_19_1.csv")

Creating a SklRegressorTest object:

python SklRegressors = SklRegressorTest(m_input=input_data, m_output=output_data, m_train_percentage=0.8)

Set the Scikit-Learn regressor models you want to test:

python SklRegressors.set_desired_models(models="all")

Get the best random_state for each model in a given interval:

python SklRegressors.test_random_states(n_random_states=10, desired_metric="root_mean_squared_error")

Clear the results in order to try to perform another method:

python SklRegressors.initialize_parameters()

Get the best combination of hyperparameters through a k-fold cross-validation holdout in a given interval:

python SklRegressors.test_spaces(n_random_states=10, rkf_cv_n_splits=5, rkf_cv_n_repeats=10, n_rand_iter=20, desired_metric="root_mean_squared_error")

Get the best model configuration after try the two method above:

python SklRegressors.test_all(n_random_states=10, rkf_cv_n_splits=5, rkf_cv_n_repeats=10, n_rand_iter=20, desired_metric="root_mean_squared_error")

Try to achieve a maximum absolute error (maxerror) lower than an estimated value before a given number of iterations n_iter just resampling the train/test data:

python SklRegressors.test_random_states_until(maxerror=1.0, n_iter=1000, desired_metric="root_mean_squared_error")

Try to achieve a maximum absolute error (maxerror) lower than an estimated value before a given number of iterations n_iter resampling the train/test data and performing the k-fold cross-validation holdout:

python SklRegressors.test_spaces_until(rkf_cv_n_splits=5, rkf_cv_n_repeats=10, n_rand_iter=20, maxerror=1.0, n_iter=100, desired_metric="root_mean_squared_error")

Get the best model configuration after try the two method above:

python SklRegressors.test_all_until(rkf_cv_n_splits=5, rkf_cv_n_repeats=10, n_rand_iter=20, maxerror=1.0, n_iter=100, desired_metric="root_mean_squared_error")

Write a summary file with the best configuration of hyperparameters and statistical data from this best model:

python summary_df = SklRegressors.write_log(path="", filename="skl_regressor_test_summary")

Print the calculated methods and their respective R² and Maximum Absolute Error:

python SklRegressors.summary()

Your output will be similar to this:

```python

| Model | Max. Error | R2 | Adj. R2 | MAE | RMAE | MSE | RMSE | MAPE |

| XGBRegressor | GradientBoostingRegressor | ExtraTreesRegressor | ExtraTreeRegressor | RandomForestRegressor | BaggingRegressor | DecisionTreeRegressor | AdaBoostRegressor | LassoCV | OrthogonalMatchingPursuitCV | | Ridge | SGDRegressor | HuberRegressor | RidgeCV | BayesianRidge | Lars | LassoLarsCV | LarsCV | LinearRegression | RANSACRegressor | Lasso | ElasticNetCV | LassoLarsIC | PassiveAggressiveRegressor | | LassoLars | OrthogonalMatchingPursuit | KNeighborsRegressor | LinearSVR | ElasticNet | SVR | NuSVR | 1.552 | 1.00 | 1.00 | 0.85 | 0.92 | 0.88 | 0.94 | 0.40 | | 1.783 | 1.00 | 1.00 | 0.80 | 0.90 | 0.89 | 0.94 | 0.15 | | 2.343 | 1.00 | 1.00 | 0.95 | 0.98 | 1.39 | 1.18 | 0.17 | | 2.422 | 1.00 | 1.00 | 1.13 | 1.06 | 2.04 | 1.43 | 0.24 | | 2.799 | 1.00 | 1.00 | 1.40 | 1.18 | 2.55 | 1.60 | 0.10 | | 3.075 | 1.00 | 1.00 | 1.17 | 1.08 | 2.26 | 1.50 | 0.16 | | 3.212 | 1.00 | 1.00 | 1.27 | 1.13 | 2.90 | 1.70 | 0.22 | | 3.754 | 1.00 | 1.00 | 1.96 | 1.40 | 5.07 | 2.25 | 0.10 | | 7.041 | 0.99 | 0.99 | 4.57 | 2.14 | 25.00 | 5.00 | 2.07 | 7.088 | 0.99 | 0.99 | 4.46 | 2.11 | 24.32 | 4.93 | 2.03 | | 7.151 | 0.99 | 0.98 | 5.09 | 2.26 | 28.91 | 5.38 | 1.30 | | 7.336 | 0.99 | 0.99 | 4.64 | 2.16 | 25.80 | 5.08 | 2.19 | | 7.369 | 0.99 | 0.99 | 4.65 | 2.16 | 26.16 | 5.11 | 2.20 | | 7.375 | 0.99 | 0.99 | 4.66 | 2.16 | 25.84 | 5.08 | 2.20 | | 7.418 | 0.99 | 0.99 | 4.67 | 2.16 | 26.08 | 5.11 | 2.21 | | 7.443 | 0.99 | 0.99 | 4.68 | 2.16 | 26.22 | 5.12 | 2.22 | | 7.443 | 0.99 | 0.99 | 4.68 | 2.16 | 26.22 | 5.12 | 2.22 | | 7.443 | 0.99 | 0.99 | 4.68 | 2.16 | 26.22 | 5.12 | 2.22 | | 7.443 | 0.99 | 0.99 | 4.68 | 2.16 | 26.22 | 5.12 | 2.22 | | 7.443 | 0.99 | 0.99 | 4.68 | 2.16 | 26.22 | 5.12 | 2.22 | | 7.620 | 0.99 | 0.99 | 4.68 | 2.16 | 25.52 | 5.05 | 1.40 | | 8.007 | 0.99 | 0.98 | 5.03 | 2.24 | 28.13 | 5.30 | 1.20 | | 8.231 | 0.99 | 0.98 | 4.83 | 2.20 | 29.24 | 5.41 | 1.11 | 9.769 | 0.99 | 0.99 | 5.08 | 2.25 | 35.22 | 5.93 | 0.38 | | 15.637 | 0.97 | 0.95 | 7.12 | 2.67 | 65.97 | 8.12 | 0.92 | | 15.926 | 0.96 | 0.95 | 10.22 | 3.20 | 119.10 | 10.91 | 1.85 | | 18.435 | 0.94 | 0.93 | 11.44 | 3.38 | 158.97 | 12.61 | 0.67 | | 25.860 | 0.89 | 0.85 | 12.34 | 3.51 | 217.04 | 14.73 | 0.90 | | 30.823 | 0.82 | 0.76 | 15.18 | 3.90 | 341.11 | 18.47 | 0.89 | | 64.718 | 0.13 | -0.16 | 32.01 | 5.66 | 1660.40 | 40.75 | 1.09 | | 70.428 | 0.10 | -0.20 | 33.68 | 5.80 | 1722.84 | 41.51 | 1.85 |

| DummyRegressor | 74.721 | -0.01 | -0.35 | 35.97 | 6.00 | 1925.02 | 43.88 | 2.21 |

```

Owner

Name: Matheus Hoffmann
Login: matheus-hoffmann
Kind: user
Company: NTT DATA

Website: https://www.linkedin.com/in/matheus-hoffmann/
Repositories: 1
Profile: https://github.com/matheus-hoffmann

Data & Analytics Team Leader with professional experience in Python, C, C++, Shell, and MATLAB.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Hoffmann Brito
    given-names: Matheus
    orcid: https://orcid.org/0000-0002-1937-1923
title: "Skl Regressor Test"
version: 0.0.2
# doi: 
date-released: 2021-07-18

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science