Recent Releases of tpot
tpot - v1.1.0
What's Changed
- Update package for Python 3.12 by @john-sandall in https://github.com/EpistasisLab/tpot/pull/1374
- Add support for Python 3.13 by @john-sandall in https://github.com/EpistasisLab/tpot/pull/1377
- fixes maxevaltimemins and maxevaltimemins to allow None and inf - also allows steady state estimator to use template search spaces by @perib in https://github.com/EpistasisLab/tpot/pull/1375
- Update for sklearn 1.6 by @perib in https://github.com/EpistasisLab/tpot/pull/1371
New Contributors
- @john-sandall made their first contribution in https://github.com/EpistasisLab/tpot/pull/1374
Full Changelog: https://github.com/EpistasisLab/tpot/compare/v1.0.0...v1.1.0
- Jupyter Notebook
Published by jay-m-dev 11 months ago
tpot - v1.0.0
What's Changed
- Codebase Migration: Consolidated tpot2 into tpot; removed deprecated/experimental features.
- Performance Enhancements: Optimized pipeline evaluation processes and genetic programming operators, leading to faster convergence and reduced computational overhead.
- Graph-Based Pipelines: Introduced a flexible graph-based representation of machine learning pipelines, enhancing the exploration of complex model architectures.
- Dependency Updates: Updated dependencies to ensure compatibility with the latest versions of scikit-learn and other essential libraries.
- Stability Improvements: Resolved various bugs and improved error handling to enhance overall stability and user experience.
- Genetic Feature Selection: Implemented genetic feature selection mechanisms, enabling automatic identification of relevant features during pipeline optimization.
- Expanded Search Spaces: Enhanced the flexibility in defining search spaces, allowing for more comprehensive exploration of potential pipeline configurations.
- Modular Framework: Refactored the codebase into a more modular structure, simplifying customization and extension of the evolutionary algorithm components.
- Documentation Overhaul: Revised and expanded documentation, including updated examples and comprehensive guides to reflect the new features and API changes.
Key Contributors
- Pedro Henrique Ribeiro (Lead developer - https://github.com/perib, https://www.linkedin.com/in/pedro-ribeiro/)
- Anil Saini (anil.saini@cshs.org)
- Jose Hernandez (jgh9094@gmail.com)
- Jay Moran (jay.moran@cshs.org)
- Nicholas Matsumoto (nicholas.matsumoto@cshs.org)
- Gabriel Ketron (Gabriel.Ketron@cshs.org)
- Hyunjun Choi (hyunjun.choi@cshs.org)
- Miguel E. Hernandez (miguel.e.hernandez@cshs.org)
- Jason Moore (moorejh28@gmail.com)
Full Changelog: https://github.com/EpistasisLab/tpot/commits/v1.0.0
- Jupyter Notebook
Published by jay-m-dev over 1 year ago
tpot - v0.12.2
What's Changed
- estimator type by @perib in https://github.com/EpistasisLab/tpot/pull/1319
- Update requirements by @gatl in https://github.com/EpistasisLab/tpot/pull/1335
- Bump torch from 1.3.1 to 1.13.1 by @dependabot in https://github.com/EpistasisLab/tpot/pull/1336
- update sklearn version to 1.1.3 by @perib in https://github.com/EpistasisLab/tpot/pull/1337
- remove deprecated imp, fix docstring warning by @perib in https://github.com/EpistasisLab/tpot/pull/1331
- update compatibility with scikitlearn 1.4 by @perib in https://github.com/EpistasisLab/tpot/pull/1343
- Improve error message by @gatl in https://github.com/EpistasisLab/tpot/pull/1338
- Mate operator fix by @perib in https://github.com/EpistasisLab/tpot/pull/1268
New Contributors
- @gatl made their first contribution in https://github.com/EpistasisLab/tpot/pull/1335
- @dependabot made their first contribution in https://github.com/EpistasisLab/tpot/pull/1336
Full Changelog: https://github.com/EpistasisLab/tpot/compare/v0.12.1...v0.12.2
- Jupyter Notebook
Published by jay-m-dev over 2 years ago
tpot - v0.12.0 release
- Fix numpy compatibility
- Dask optimizations
- Minor bug fixes
- Jupyter Notebook
Published by nickotto about 3 years ago
tpot - v0.11.7 minor release
- Fix compatibility issue with scikit-learn 0.24 and xgboost 1.3.0
- Fix a bug causing that TPOT does not work when classifying more than 50 classes
- Add initial support
Resamplerfromimblearn - Fix minor bugs
- Jupyter Notebook
Published by weixuanfu over 5 years ago
tpot - 0.11.6.post3
- A patch to fix compatibility issues with the latest version of xgboost (v1.3.0)
- Jupyter Notebook
Published by weixuanfu over 5 years ago
tpot - v0.11.6.post2
- make XGBoost as a required dependency
- Jupyter Notebook
Published by weixuanfu over 5 years ago
tpot - v0.11.6.post1
- Refine the logic of checking the type of an operator.
- Jupyter Notebook
Published by weixuanfu over 5 years ago
tpot - Version 0.11.6
- Fix a bug causing point mutation function does not work properly with using
templateoption - Add a new built configuration called "TPOT cuML" which TPOT will search over a restricted configuration using the GPU-accelerated estimators in RAPIDS cuML and DMLC XGBoost. This configuration requires an NVIDIA Pascal architecture or better GPU with compute capability 6.0+, and that the library cuML is installed.
- Add string path support for log/log_file parameter
- Fix a bug in version 0.11.5 causing no update in stdout after each generation
- Fix minor bugs
- Jupyter Notebook
Published by weixuanfu over 5 years ago
tpot - Covariate adjustments branch
- Development branch based on TPOT 0.11.1 for adjusting covariate without data leakage.
- Jupyter Notebook
Published by weixuanfu over 5 years ago
tpot - TPOT v0.11.4 minor release
- Add a new built configuration "TPOT NN" which includes all operators in "Default TPOT" plus additional neural network estimators written in PyTorch (currently
tpot.builtins.PytorchLRClassifierandtpot.builtins.PytorchMLPClassifierfor classification tasks only) - Refine
log_fileparameter's behavior
- Jupyter Notebook
Published by weixuanfu almost 6 years ago
tpot - TPOT v0.11.3 minor release
- Fix a bug in TPOTRegressor in v0.11.2
- Add
-logoption in command line interface to save process log to a file.
- Jupyter Notebook
Published by weixuanfu about 6 years ago
tpot - TPOT v0.11.2 Minor Release
- Fix
early_stopparameter does not work properly - TPOT built-in
OneHotEncodercan refit to different datasets - Fix the issue that the attribute
evaluated_individuals_cannot record correct generation info. - Add a new parameter
log_fileto output logs to a file instead ofsys.stdout - Fix some code quality issues and mistakes in documentations
- Fix minor bugs
- Jupyter Notebook
Published by weixuanfu about 6 years ago
tpot - TPOT v0.11.1 Minor Release
- Fix compatibility issue with scikit-learn v0.22
warm_startnow saves both Primitive Sets and evaluatedpipelines from previous runs;- Fix the error that TPOT assign wrong fitness scores to non-evaluated pipelines (interrupted by
max_min_minsorKeyboardInterrupt) ; - Fix the bug that mutation operator cannot generate new pipeline when template is not default value and
warm_startis True; - Fix the bug that
max_time_minscannot stop optimization process when search space is limited. - Fix a bug in exported codes when the exported pipeline is only 1 estimator
- Fix spelling mistakes in documentations
- Fix some code quality issues
- Jupyter Notebook
Published by weixuanfu over 6 years ago
tpot - Version 0.11.0
- Support for Python 3.4 and below has been officially dropped. Also support for scikit-learn 0.20 or below has been dropped.
- The support of a metric function with the signature
score_func(y_true, y_pred)forscoring parameterhas been dropped. - Refine
StackingEstimatorfor not stacking NaN/Infinity predication probabilities. - Fix a bug that population doesn't persist even
warm_start=Truewhenmax_time_minsis not default value. - Now the
random_stateparameter in TPOT is used for pipeline evaluation instead of using a fixed random seed of 42 before. Theset_param_recursivefunction has been moved toexport_utils.pyand it can be used in exported codes for settingrandom_staterecursively in scikit-learn Pipeline. It is used to setrandom_stateinfitted_pipeline_attribute and exported pipelines. - TPOT can independently use
generationsandmax_time_minsto limit the optimization process through using one of the parameters or both. .export()function will return string of exported pipeline if output filename is not specified.- Add
SGDClassifierandSGDRegressorinto TPOT default configs. - Documentation has been updated.
- Fix minor bugs.
- Jupyter Notebook
Published by weixuanfu over 6 years ago
tpot - TPOT v0.10.2 minor release
- TPOT v0.10.2 is the last version to support Python 2.7 and Python 3.4.
- Minor updates for fixing compatibility issues with the latest version of scikit-learn (version > 0.21) and xgboost (v0.90)
- Default value of
templateparameter is changed toNoneinstead. - Fix errors in documentation
- Jupyter Notebook
Published by weixuanfu almost 7 years ago
tpot - TPOT v0.10.1 minor release
- Add
data_file_pathoption intoexpertfunction for replacing'PATH/TO/DATA/FILE'to customized dataset path in exported scripts. (Related issue #838) - Change python version in CI tests to 3.7
- Add CI tests for macOS.
- Jupyter Notebook
Published by weixuanfu about 7 years ago
tpot - TPOT 0.10.0 Release
- Add a new
templateoption to specify a desired structure for machine learning pipeline in TPOT. Check TPOT API (it will be updated once it is merge to master branch). - Add
FeatureSetSelectoroperator into TPOT for feature selection based on priori export knowledge. Please check our preprint paper for more details (Note: it was namedDatasetSelectorin 1st version paper but we will rename to FeatureSetSelector in next version of the paper) - Refine
n_jobsparameter to accept value below -1. For njobs below -1, (ncpus + 1 + njobs) are used. Thus for njobs = -2, all CPUs but one are used. It is related to the issue #846. - Now
memoryparameter can create memory cache directory if it does not exist. It is related to the issue #837. - Fix minor bugs.
- Jupyter Notebook
Published by weixuanfu about 7 years ago
tpot - TPOT 0.9.6 Minor Release
- Fix a bug causing that
max_time_minsparameter doesn't work whenuse_dask=Truein TPOT 0.9.5 - Now TPOT saves best pareto values best pareto pipeline s in checkpoint folder
- TPOT raises
ImportErrorif operators in the TPOT configuration are not available whenverbosity>2 - Thank @PGijsbers for the suggestions. Now TPOT can save scores of individuals already evaluated in any generation even the evaluation process of that generation is interrupted/stopped. But it is noted that, in this case, TPOT will raise this warning message:
WARNING: TPOT may not provide a good pipeline if TPOT is stopped/interrupted in a early generation., because the pipelines in early generation, e.g. 1st generation, are evolved/modified very limited times via evolutionary algorithm. - Fix bugs in configuration of
TPOTRegressor - Error fixes in documentation
- Jupyter Notebook
Published by weixuanfu about 7 years ago
tpot - TPOT now supports integration with Dask for parallelization
TPOT now supports integration with Dask for parallelization + smart caching. Big thanks to the Dask dev team for making this happen!
TPOT now supports for imputation/sparse matrices into
predictandpredict_probafunctions.TPOTClassifierandTPOTRegressornow follows scikit-learn estimator API.We refined scoring parameter in TPOT API for accepting
Scorerobject.We refined parameters in VarianceThreshold and FeatureAgglomeration.
TPOT now supports using memory caching within a Pipeline via a optional
memoryparameter.We improved documentation of TPOT.
- Jupyter Notebook
Published by weixuanfu over 7 years ago
tpot - Sparse matrix support, early stopping, and checkpointing
TPOT now supports sparse matrices with a new built-in TPOT configurations, "TPOT sparse". We are using a custom OneHotEncoder implementation that supports missing values and continuous features.
We have added an "early stopping" option for stopping the optimization process if no improvement is made within a set number of generations. Look up the
early_stopparameter to access this functionality.TPOT now reduces the number of duplicated pipelines between generations, which saves you time during the optimization process.
TPOT now supports custom scoring functions via the command-line mode.
We have added a new optional argument,
periodic_checkpoint_folder, that allows TPOT to periodically save the best pipeline so far to a local folder during optimization process.TPOT no longer uses
sklearn.externals.joblibwhenn_jobs=1to avoid the potential freezing issue that scikit-learn suffers from.We have added
pandasas a dependency to read input datasets instead ofnumpy.recfromcsv. NumPy'srecfromcsvfunction is unable to parse datasets with complex data types.Fixed a bug that
DEFAULTin the parameter(s) of nested estimator raisesKeyErrorwhen exporting pipelines.Fixed a bug related to setting
random_statein nested estimators. The issue would happen with pipeline withSelectFromModel(ExtraTreesClassifieras nested estimator) orStackingEstimatorif nested estimator hasrandom_stateparameter.Fixed a bug in the missing value imputation function in TPOT to impute along columns instead rows.
Refined input checking for sparse matrices in TPOT.
- Jupyter Notebook
Published by rhiever over 8 years ago
tpot - More built-in configurations, missing data support, and detailed API documentation
TPOT now detects whether there are missing values in your dataset and replaces them with the median value of the column.
TPOT now allows you to set a
groupparameter in thefitfunction so you can use the GroupKFold cross-validation strategy.TPOT now allows you to set a subsample ratio of the training instance with the
subsampleparameter. For example, settingsubsample=0.5 tells TPOT to create a fixed subsample of half of the training data for the pipeline optimization process. This parameter can be useful for speeding up the pipeline optimization process, but may give less accurate performance estimates from cross-validation.TPOT now has more built-in configurations, including TPOT MDR and TPOT light, for both classification and regression problems.
TPOTClassifierandTPOTRegressornow expose three useful internal attributes,fitted_pipeline_,pareto_front_fitted_pipelines_, andevaluated_individuals_. These attributes are described in the API documentation.Oh, TPOT now has thorough API documentation. Check it out!
Fixed a reproducibility issue where setting
random_seeddidn't necessarily result in the same results every time. This bug was present since TPOT v0.7.Refined input checking in TPOT.
Removed Python 2 uncompliant code.
- Jupyter Notebook
Published by rhiever almost 9 years ago
tpot - Multiprocessing support and custom operator configurations
TPOT 0.7 is now out, featuring multiprocessing support for Linux and macOS, customizable operator configurations, and more.
TPOT now has multiprocessing support (Linux and macOS only). TPOT allows you to use multiple processes for accelerating pipeline optimization in TPOT with the
n_jobsparameter in both TPOTClassifier and TPOTRegressor.TPOT now allows you to customize the operators and parameters explored during the optimization process. TPOT allows you to customize the list of operators and parameters in optimization process of TPOT with the
config_dictparameter. The format of this customized dictionary can be found in the online documentation.TPOT now allows you to specify a time limit for evaluating a single pipeline (default limit is 5 minutes) in optimization process with the
max_eval_time_minsparameter, so TPOT won't spend hours evaluating overly-complex pipelines.We tweaked TPOT's underlying evolutionary optimization algorithm to work even better, including using the mu+lambda algorithm. This algorithm gives you more control of how many pipelines are generated every iteration with the
offspring_sizeparameter.Fixed a reproducibility issue where setting
random_seeddidn't necessarily result in the same results every time. This bug was present since version 0.6.Refined the default operators and parameters in TPOT, so TPOT 0.7 should work even better than 0.6.
TPOT now supports sample weights in the fitness function if some if your samples are more important to classify correctly than others. The sample weights option works the same as in scikit-learn, e.g.,
tpot.fit(x_train, y_train, sample_weights=sample_weights).The default scoring metric in TPOT has been changed from balanced accuracy to accuracy, the same default metric for classification algorithms in scikit-learn. Balanced accuracy can still be used by setting
scoring='balanced_accuracy'when creating a TPOT instance.
- Jupyter Notebook
Published by rhiever about 9 years ago
tpot - Support for regression problems
- TPOT now supports regression problems! We have created two separate
TPOTClassifierandTPOTRegressorclasses to support classification and regression problems, respectively. The command-line interface also supports this feature through the-modeparameter. - TPOT now allows you to specify a time limit for the optimization process with the
max_time_minsparameter, so you don't need to guess how long TPOT will take any more to recommend a pipeline to you. - Added a new operator that performs feature selection using ExtraTrees feature importance scores.
- XGBoost has been added as an optional dependency to TPOT. If you have XGBoost installed, TPOT will automatically detect your installation and use the
XGBoostClassifierandXGBoostRegressorin its pipelines. - TPOT now offers a verbosity level of 3 ("science mode"), which outputs the entire Pareto front instead of only the current best score. This feature may be useful for users looking to make a trade-off between pipeline complexity and score.
- Jupyter Notebook
Published by rhiever over 9 years ago
tpot - Full support for scikit-learn Pipelines
After a couple months hiatus in refactor land, we're excited to release the latest and greatest version of TPOT v0.5. For the past couple months, we worked on heavily refactoring TPOT's code base from a hacky research demo into a more elegant code base that will be easier to maintain in the long run. As an added bonus, TPOT now directly optimizes over and exports to scikit-learn Pipeline objects, so your auto-generated code should be much more readable.
Major changes in v0.5:
- Major refactor: Each operator is defined in a separate class file. Hooray for easier-to-maintain code!
- TPOT now exports directly to scikit-learn Pipelines instead of hacky code.
- Internal representation of individuals now uses scikit-learn pipelines.
- Parameters for each operator have been optimized so TPOT spends less time exploring useless parameters.
- We have removed pandas as a dependency and instead use numpy matrices to store the data.
- TPOT now uses k-fold cross-validation when evaluating pipelines, with a default k = 3. This k parameter can be tuned when creating a new TPOT instance.
- Improved scoring function support: Even though TPOT uses balanced accuracy by default, you can now have TPOT use any of the scoring functions that cross_val_score supports.
- Added the scikit-learn Normalizer preprocessor.
- Minor text fixes.
- Jupyter Notebook
Published by rhiever almost 10 years ago
tpot - Major upgrade
In TPOT 0.4, we've made some major changes to the internals of TPOT and added some convenience functions. We've summarized the changes below.
- Added new sklearn models and preprocessors
- AdaBoostClassifier
- BernoulliNB
- ExtraTreesClassifier
- GaussianNB
- MultinomialNB
- LinearSVC
- PassiveAggressiveClassifier
- GradientBoostingClassifier
- RBFSampler
- FastICA
- FeatureAgglomeration
- Nystroem
- Added operator that inserts virtual features for the count of features with values of zero
- Reworked parameterization of TPOT operators
- Reduced parameter search space with information from a scikit-learn benchmark
- TPOT no longer generates arbitrary parameter values, but uses a fixed parameter set instead
- Removed XGBoost as a dependency
- Too many users were having install issues with XGBoost
- Replaced with scikit-learn's GradientBoostingClassifier
- Improved descriptiveness of TPOT command line parameter documentation
- Removed min/max/avg details during fit() when verbosity > 1
- Replaced with tqdm progress bar
- Added tqdm as a dependency
- Added
fit_predict()convenience function - Added
get_params()function so TPOT can operate in scikit-learn'scross_val_score& related functions
- Jupyter Notebook
Published by rhiever almost 10 years ago
tpot - Zenodo release
Zenodo requires me to make a new release to assign a DOI, so here's that release. This is not a full release.
- Jupyter Notebook
Published by rhiever about 10 years ago
tpot - GECCO 2016 paper release
This is the version of TPOT that was used in the GECCO 2016 paper, "Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science."
- Jupyter Notebook
Published by rhiever over 10 years ago
tpot - Export functionality and more ML models
New in v0.2.0: - TPOT now has the ability to export the optimized pipelines to sklearn code. See the documentation for more information. - Logistic regression, SVM, and k-nearest neighbors classifiers were added as pipeline operators. Previously, TPOT only included decision tree and random forest classifiers. - TPOT can now use arbitrary scoring functions for the optimization process. See the scoring function documentation for more information.
- Jupyter Notebook
Published by rhiever over 10 years ago