Recent Releases of tpcp
tpcp - v2.1.0 - Improvemetns to DatasetSplitter
[2.1.0] - 2025-07-24
Changed
- (Potential Breaking): We now check explicitly, if classes follow the "rules of tpcp" and simply forward or set parameters without modifying them. Before, we just "hoped" that this is the case. If you had classes that did not follow these rules, you will now get an error.
Added
- The
DatasetSplitternow auto selects a proper splitter based on inputs and attempts to validate if the passed splitter supports the features needed.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle 6 months ago
tpcp - v2.0.0 - Major scorer rework
[2.0.0] - 2024-10-24
Added
- The global cache helper now support algorithms with multiple action methods by specifying the name of the action method you want to cache. (https://github.com/mad-lab-fau/tpcp/pull/118)
- Global disk cache helper should now be able to cache the action methods of algorithm classes defined in the main script. (https://github.com/mad-lab-fau/tpcp/pull/118)
- There are new builtin
FloatAggregatorandMacroFloatAggregatorthat should cover many of the use cases that previously required custom aggregators. (https://github.com/mad-lab-fau/tpcp/pull/118) - Scorers now support passing a
final_aggregator. This is called after all scoring and aggregation happens and allows to implement complicated "meta" aggregation that depends on the results of all scores of all datapoints. Note, that we are not sure yet, if this should be used more as an escape hedge and overusing it should be considered an anti-pattern, or if it is exactly the other way around. We need to experiment in a couple of real-life applications to figure this out. (https://github.com/mad-lab-fau/tpcp/pull/120) - Dataset classes now have a proper
__equals__implementation. (https://github.com/mad-lab-fau/tpcp/pull/120)
Changed
- Relative major overhall of how aggregator in scoring functions work. Before, aggregators were classes that were
initialized with the value of a score. Now they are instances of a class that is called with the value of a score.
This change allows it to create "configurable" aggregators that get the configuration at initialization time.
(https://github.com/mad-lab-fau/tpcp/pull/118)
This comes with a couple of breaking changes:
- The most "user-facing" one is that the
NoAggaggregator is now calledno_aggindicating that it is an instance of a class and not a class itself. - All custom aggregators need to be rewritten, but you will likely find, that they are much simpler now. (see the reworked examples for custom aggregators)
- The most "user-facing" one is that the
Fixed
- Fixed massive performance regression in version 0.34.1 affecting people that had tensorflow or torch installed, but did not use it in their code. The reason for that was, that we imported the two modules in the global scope, which caused importing tpcp to be very slow. This was particularly noticeable in case of multiprocessing, as the module was imported in every worker process. We now only import the module, within the clone function and only, if you had imported it before. (https://github.com/mad-lab-fau/tpcp/pull/118)
- The custom hash function now has a different way of hashing functions and classes defined in local scopes. This should prevent strange pickling errors from just using "tpcp" normally. (https://github.com/mad-lab-fau/tpcp/pull/118)
Removed
scorefunctions implemented directly as method on the pipeline class are no longer supported. Score functions now need to be independent functions that take a pipeline instance as their first argument. For this reason, it is also no longer supported to passNoneas argument toscoringin any validate or optimize method. (https://github.com/mad-lab-fau/tpcp/pull/120)
Scientific Software - Peer-reviewed
- Python
Published by AKuederle about 1 year ago
tpcp - v1.0.1 - Resolved install issues with UV
[1.0.1] - 2024-10-18
Fixes names of optional dependency groups. That should resolve install issues when using uv as package manager.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle about 1 year ago
tpcp - v1.0.0 - Cross-Validation improved!
[1.0.0] - 2024-07-03
Note: This is a major version bump, because we have quite substantial breaking changes. The 1.0 should not signal that we are now feature complete. Though the core APIs have been mostly stable for quite some time now.
BREAKING CHANGE
- Instead of the (annoying)
mock_labelandgroup_labelarguments, all functions that take a cv-splitter as input, can now take an instance of the newDatasetSplitterclass, which elegantly handles grouping and stratification and also removes the need of forwarding themock_labelandgroup_labelarguments to the underlying optimizer. The use of themock_labelandgroup_labelarguments has been removed without depreciation. (https://github.com/mad-lab-fau/tpcp/pull/114) - All classes and methods that "grid-search" or "cross-validate" like output (
GridSearch,GridSearchCv,cross_validate,validate) have updated names for all their output attributes. In most cases the output naming has switched from a single underscore to a double underscore to separate the different parts of the output name to make it easier to programmatically access the output. (https://github.com/mad-lab-fau/tpcp/pull/117)
Scientific Software - Peer-reviewed
- Python
Published by AKuederle over 1 year ago
tpcp - v0.34.1 - Fix Torch and Tensorflow support
Fixed
- The torch hasher was not working at all. This is hopefully fixed now.
- The tensorflow clone method did not work. Switched to specialized implementation that hopefully works.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle over 1 year ago
tpcp - v0.34.0 - Some smaller improvments
[0.34.0] - 2024-06-28
Added
- Dataset classes are now generic and allow you to provide the group-label tuple as generic. This allows for better type checking and IDE support. (https://github.com/mad-lab-fau/tpcp/pull/113)
Changed/Fixed
- The snapshot utilities are much more robust now and rais appropriate errors when the stored dataframes have unsupported properties. (https://github.com/mad-lab-fau/tpcp/pull/112)
Scientific Software - Peer-reviewed
- Python
Published by AKuederle over 1 year ago
tpcp - v0.33.1 - Less caching warnings
Scientific Software - Peer-reviewed
- Python
Published by AKuederle over 1 year ago
tpcp - v0.33.0 - Some more TypedIterator stuff and some QoL improvements
[0.33.0] - 2024-05-23
Added
custom_hashthe internally used hashing method based on pickle is now part of the public API viatpcp.misc.DummyOptimizeallows to ignore the warning that it usually throws.
Changed
- Relative large rework of the TypedIterator. We recommend to reread the example.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle over 1 year ago
tpcp - v0.32.0 - Better snapshots
[0.32.0] - 2024-04-17
- The snapshot plugin now supports a new command line argument
--snapshot-only-checkthat will fail the test if no snapshot file is found. This is usefull for CI/CD pipelines, where you want to ensure that all snapshots are up to date. - The snapshot plugin is now installed automatically when you install tpcp. There is no need to set it up in the conftest file anymore.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle over 1 year ago
tpcp - v0.31.2 - More Typed Iterator fixes
[0.31.2] - 2024-02-01
Fixed
- TypedIterator does not run into a RecursionError anymore, when attributes with the wrong name are accessed.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle almost 2 years ago
tpcp - v0.31.1 - Fix agg in typed iterator
[0.31.1] - 2024-02-01
Fixed
- TypedIterator now skips aggregation when no values are provided
Scientific Software - Peer-reviewed
- Python
Published by AKuederle almost 2 years ago
tpcp - v0.31.0 - Slightly better TypedIterator
[0.31.0] - 2024-01-31
Changed
- The TypedIterator now has a new
results_attribute and has improved typing to allow for better IDE integration.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle almost 2 years ago
tpcp - v0.30.3 - Sklearn downgrade
[0.30.3] - 2024-01-23
Fixed
- Downgraded minimum version of sklearn to 1.2.0 to avoid version conflicts with other packages.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle almost 2 years ago
tpcp - v0.30.2 - Better Docs and Typing for Class utils
[0.30.2] - 2024-01-23
Changed
- Better typing and Docstrings for the new functions introduced in 0.30.0
NOTE: Previous version 0.30.1 was yanked
Scientific Software - Peer-reviewed
- Python
Published by AKuederle almost 2 years ago
tpcp - v0.30.0 - Class utils
[0.30.0] - 2024-01-23
Added
- Added a new
classpropertythat allows to define class level properties equivalent to@propertyfor instances. - Added a new
set_defaultsdecorator that allows to modify the default values or a function or class init.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle almost 2 years ago
tpcp - v0.29.0 - Hybrid Caching
[0.29.0] - 2023-12-19
Added
- Added a new
hybrid_cachethat allows to cache in RAM and Disk at the same time.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle about 2 years ago
tpcp - v0.28.0 - Cache all the things
[0.28.0] - 2023-12-19
Changed
- The minimal version of pandas was reduced to 1.3. It still seems to work with that minimal version and this avoids version conflicts with other packages.
Added
- Helper to perform global caching of algorithm actions. This can be helpful to cache results of algorithms that are deeply nested within other methods or algorithms that are called multiple times withing the same pipeline. (https://github.com/mad-lab-fau/tpcp/pull/103)
- Clone now supports recursive cloning of dicts. This allows the theoretical use of dictionaries as parameters.
Removed
- The test that checks if all mutable defaults are wrapped in
CloneFactoryis now removed. This check is performed at runtime anyway.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle about 2 years ago
tpcp - v0.27.0 - New Iterator Baseclass
[0.27.0] - 2023-11-09
Added
- The TypedIterator (introduced in 0.26.0) now hase a base class (BaseTypedIterator), that can be used to implement
custom iterators that can get custom inputs to the
iteratemethod, that are then further processed before the actual iteration.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle about 2 years ago
tpcp - v0.26.2 - Better assertions (now for real)
[0.26.2] - 2023-11-05
Fixed
- Now actually fixed the pytest registration of the testing modules.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle about 2 years ago
tpcp - v0.26.1 - Better Assertions
[0.26.1] - 2023-11-05
Fixed
- The testing modules are now registered as pytest files, which should result in verbose assert statements, making debugging easier.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle about 2 years ago
tpcp - v0.26.0 - TypedIterator
[0.26.0] - 2023-11-03
Added
- TypedIterator (https://github.com/mad-lab-fau/tpcp/pull/100): A new helper that makes iterating over things and accumulating results much easier.
Changed
- Improved typing of "safe" decorators (https://github.com/mad-lab-fau/tpcp/pull/100). This should fix wrong IDE typehints.
- Now using py39 type typehints
Scientific Software - Peer-reviewed
- Python
Published by AKuederle about 2 years ago
tpcp - v0.25.1 - Fixed Documentation-tests-mixin
[0.25.1] - 2023-10-25
Fixed
- Ignored names in the testing mixin are now correctly ignored both-ways. I.e. it allows to document additional parameters as well, not just leave out parameters.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle about 2 years ago
tpcp - v0.25 - End of Py3.8 and new validate method
[0.25.0] - 2023-10-24
Added
- The Scorer class now has the ability to score datapoints in parallel.
This can be enabled by setting the
n_jobsparameter of theScorerclass to something larger than 1. (https://github.com/mad-lab-fau/tpcp/pull/95) - The
PyTestSnapshotTestclass does now support comparing dataframes with datetime columns. (https://github.com/mad-lab-fau/tpcp/pull/97) - The
validatefunction was introduced to enable validation of an algorithm on arbitrary data without parameter optimization. (https://github.com/mad-lab-fau/tpcp/pull/99) - Fixed the bug that the functions
optimizeandcross_validatewere crashing whenprogress_barwas deactivated. - New example about caching. (https://github.com/mad-lab-fau/tpcp/pull/98)
Changed
- In line with numpy and some other packages, we drop Python 3.8 support
Scientific Software - Peer-reviewed
- Python
Published by AKuederle about 2 years ago
tpcp - v0.24.0 - Dateset Improvements
[0.24.0] - 2023-09-08
For all changes in this release see: https://github.com/mad-lab-fau/tpcp/pull/85
Deprecated
- The properties
groupandgroupsof theDatasetclass are deprecated and will be removed in a future release. They are replaced by thegroup_labelandgroup_labelsproperties of theDatasetclass. This renaming was done to make it more clear that these properties return the labels of the groups and not the groups themselves. - The
create_group_labelsmethod of theDatasetclass is deprecated and will be removed in a future release. It is replaced by thecreate_string_group_labelsmethod of theDatasetclass. This renaming was done to avoid confusion with the new names forgroupsandgroup
Added
- Added
index_as_tuplesmethod to theDatasetclass. It returns the full index of the dataset as a list of named tuples regardless of the current grouping. This might be helpful to extract the label information of a datapoint, whengrouprequires to handle multiple cases, as your code expects the dataset in different grouped versions.
Changed
- BREAKING CHANGE (with Deprecation): The
groupproperty of theDatasetclass is now calledgroup_label. - BREAKING CHANGE: The
group_labelproperty now always returns named tuples of strings (even for single groups where it used to return strings!). - BREAKING CHANGE (with Deprecation): The
groupsproperty of theDatasetclass is now calledgroup_labels. - BREAKING CHANGE: The
group_labelsproperty always returns a list of named tuples of strings (even for single groups where it used to return a list of strings!). - BREAKING CHANGE: The parameter
groupsof theget_subsetmethod of theDatasetclass is now calledgroup_labelsand always expects a list of named tuples of strings.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle over 2 years ago
tpcp - v0.23.0 - Testing Utils
[0.23.0] - 2023-08-30
Added
- We migrated some testing utilities from other libraries to tpcp and exposed some algorithm test helper
that previously only existed in the tests folder via the actual tpcp API.
This should make testing algorithms and pipelines developed with tpcp easier.
These new features are now available in the
tpcp.testingmodule. (https://github.com/mad-lab-fau/tpcp/pull/89)
Scientific Software - Peer-reviewed
- Python
Published by AKuederle over 2 years ago
tpcp - v0.22.1 - Fixed `safe_optimize` for GridSearchCV
[0.22.1] - 2023-08-30
Fixed
- The
safe_optimizeparameter ofGridSearchCVis now correctly used during reoptimization. Before, it was only forwarded to theOptimizewrapper during the actual Grid-Search, but not during the final reoptimization.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle over 2 years ago
tpcp - v0.22.0 - Tensorflow support
[0.22.0] - 2023-08-25
Added
- Official support for tensorflow/keras. The custom hash function now manages tensorflow models explicitly.
This makes it possible again to use the
make_action_safeandmake_optimize_safedecorators with algorithms and pipelines that have tensorflow/keras models as parameters. (https://github.com/mad-lab-fau/tpcp/pull/87) - Added a new example for tensorflow/keras models. (https://github.com/mad-lab-fau/tpcp/pull/87)
Scientific Software - Peer-reviewed
- Python
Published by AKuederle over 2 years ago
tpcp - v0.20.1: Fix cross-validation regression
[0.20.1] - 2023-07-25
Fixed
- Fixed regression introduced in 0.19.0, which resulted in optimizers not beeing correctly cloned per fold. In result, each CV fold would overwrite the optimizer object of the previous fold. This did not affect the reported results, but the returned optimizer object was not the one that was used to calculate the results.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle over 2 years ago
tpcp - v0.20.0 - BREAKING CHANGE: Fix optuna multiprocessing
[0.20.0] - 2023-07-24
Changed
- BREAKING CHANGE: The way how all Optuna based optimizer work has been changed.
Instead of passing a function, that returns a study, you now need to pass a function that returns the parameters of a
study.
Creating the study is now handled by tpcp internally to avoid issues with multiprocessing.
This results in two changes.
The parameter name for all optuna pipelines has changed from
create_studytoget_study_params. Further, the expected call signature changed, asget_study_paramsnow gets a seed as argument. This seed should be used to initialize the random number generator of the sampler and pruner of a study to ensure that each process gets a different seed and sampling process. (https://github.com/mad-lab-fau/tpcp/pull/80)
To migrate your code, you need to change the following:
OLD:
```python def createstudy(): return optuna.createstudy(sampler=RandomSampler(seed=42))
OptunaSearch(..., createstudy=createstudy, ...) ```
NEW:
```python def getstudyparams(seed: int): return dict(sampler=RandomSampler(seed=seed))
OptunaSearch(..., getstudyparams=getstudyparams, random_seed=42, ...) ```
Scientific Software - Peer-reviewed
- Python
Published by AKuederle over 2 years ago
tpcp - v0.19.0 - Joblib Fixes and better errors
[0.19.0] - 2023-07-06
Added
- All optimization methods that do complicated loops (over parameters or CV-Folds) now raise new custom error messages (OptimizationError and TestError) if they encounter an error. These new errors have further information in which iteration of the loop the error occurred and should make it easier to debug issues.
- When a scorer fails, we now print the name (i.e. the group) of the datapoint that caused the error. This should make it easier to debug issues with the scorer.
Changed
- We dropped support for joblib<0.13.0. due to some changes in the API. We only support the new API now, which allowed us to simplify some of the multiprocessing code.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle over 2 years ago
tpcp - v0.18.0 - Some more validation
[0.18.0] - 2023-04-13
Fixed
- When
super().__init__()is called before all parameters of the child class are initialized, we don't get an error anymore. Now all classes remember their parameters when they are defined and don't try to access parameters that are not defined in their own init. (https://github.com/mad-lab-fau/tpcp/pull/69)
Changed
- Validation is now performed recursively on all subclasses. Note like before validation is still only performed once per class. But with this change, we can also validate base classes that are not used directly. (https://github.com/mad-lab-fau/tpcp/pull/70)
Added
- We validate now, if a child class implements all the parameters of its parent class. While not strictly necessary, this is a sign of bad design, if not done. It could also lead to issues with tpcps validation logic. (https://github.com/mad-lab-fau/tpcp/pull/70)
- It is now possible to hook into the validation and perform custom validation of classes. (https://github.com/mad-lab-fau/tpcp/pull/70)
- The dataset class now activly triggers validation and checks if the dataset subclass implements
groupby_colsandsubset_index.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle over 2 years ago
tpcp - 0.17.0 - Parallel Fixes
[0.17.0] - 2023-03-24
Added
- We now have a workaround for global configuration that should be passed to worker processes when using
multiprocessing.
This is a workaround to a joblib issue and is quite hacky.
If you want to use this feature with your own configs you can use
tpcp.parallel.register_global_parallel_callback. If you need to write your own parallel loop using joblib, you need to usetpcp.parallel.delayedinstead ofjoblib.delayed. (https://github.com/mad-lab-fau/tpcp/pull/65)
Scientific Software - Peer-reviewed
- Python
Published by AKuederle almost 3 years ago
tpcp - v0.16.0 - OptunaSearch `eval_str_paras` feature
[0.16.0] - 2023-03-21
Changed
- We are now raising an explicit ValidationError, if any of the parameters of a class have a trailing underscore, as this syntax is reserved for result objects. (https://github.com/mad-lab-fau/tpcp/pull/63)
Added
- The Optuna search methods have new parameter called
eval_str_parasthat allows to automatically turn categorical string parameters into python objects. This can be usefull, if you want to select between complex objects and not just strings in your parameter search. To use this in your subclasses, you need to wrap the use oftrial.paramswithself.sanitize_params(trial.params). (https://github.com/mad-lab-fau/tpcp/pull/64)
Scientific Software - Peer-reviewed
- Python
Published by AKuederle almost 3 years ago
tpcp - v0.15.0
[0.15.0] - 2023-02-07
Added
- GridSearch and GridSearchCV now have the option to pick the parameters with the lowest score if desired.
This is useful, if your metric represents an error and you want to pick the parameters that minimize the error.
To do that, you can set the
return_optimizedparameter of these classes to the name of metric prefixed with a-. (e.g.return_optimized="-rmse"). (https://github.com/mad-lab-fau/tpcp/pull/61) - A new Optimization Algorithm called
OptunaSearch. This is a (nearly) drop-in replacement forGridSearchusing Optuna under the hood. It can be used to quickly implement parameter searches with different samplers for non-optimizable algorithms. (https://github.com/mad-lab-fau/tpcp/pull/57)
Changed
- In this release we added multiple safe guards against edge cases related to non-deterministic dataset indices.
Most of these changes are internal and should not require any changes to your code.
Still, they don't solve all edge cases. Make sure your index is deterministic ;)
(https://github.com/mad-lab-fau/tpcp/pull/62)
- The index of datasets objects are now cached
The first time
create_indexis called, the index is stored insubset_indexand used for subsequent calls. This should avoid the overhead of creating the index every time (in particular if the index creation requires IO). It should also help to avoid edge cases, wherecreate_indexis called multiple times and returns different results. - When
create_indexof a dataset is called, we actually call it twice now, to check if the index is deterministic. Having a non-deterministic index can lead to hard to debug issues, so we want to make sure that this is not the case. It could still be that the index changes when using a different machine/OS (which is not ideal for reproducibility), but this should prevent most cases leading to strange issues. - Internally, the
_optimize_and_scoremethod now directly gets the subset of the dataset, instead of the indices of the train and test set. This should again help to avoid issues, where the index of the dataset changes between calculating the splits and actually retrieving the data.
- The index of datasets objects are now cached
The first time
Scientific Software - Peer-reviewed
- Python
Published by AKuederle almost 3 years ago
tpcp - v0.14.0
[0.14.0] - 2023-02-01
Added
- Custom Aggregators can now use the
RETURN_RAW_SCORESclass variable to specify, if their raw input scores should be returned. (https://github.com/mad-lab-fau/tpcp/pull/58)
Fixed
- GridSearch and GridSearchCV now correctly handle custom aggregators that return scores with new names. (https://github.com/mad-lab-fau/tpcp/pull/58)
- When using the
create_group_labelsmethod on dataset with multiple groupby columns, the method returned a list of tuples. This caused issues withGroupKFold, as the method internally flattens the list of tuples. To avoid this, the method now return a list of strings. The respective string is simply the string representation of the tuple that was returned before. (https://github.com/mad-lab-fau/tpcp/pull/59) - The fix provided in 0.12.1 to fix hashing of objects defined in the
__main__module was only partially working. When the object in question was nested in another object, the hashing would still fail. This is hopefully now fixed for good. (https://github.com/mad-lab-fau/tpcp/pull/60)
Scientific Software - Peer-reviewed
- Python
Published by AKuederle almost 3 years ago
tpcp - v0.13.0 - JOSS Paper
[0.13.0] - 2023-01-11
Changed
- Some improvements to the documentation
Added
- Added an option to the optuna search to use multiprocessing using the suggestions made in https://github.com/optuna/optuna/issues/2862 . This has not been extensively tested in real projects. Therefore, use with care and please report any issues you encounter.
Deprecated
- Fully deprecated the
_skip_validationparameter for base classes, which was briefly used for some old versions.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle almost 3 years ago
tpcp - v0.12.1
Changed
- The
safe_runmethod did unintentionally double-wrap the run method, if it already had amake_action_safedecorator. This is now fixed.
Fixed
- Under certain conditions hashing of an object defined in the
__main__module failed. This release implements a workaround for this issue, that should hopefully resolve most cases.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle about 3 years ago
tpcp - v0.12.0 - Some minor quality of life improvements
Added
- Added the concept of the
self_optimize_with_infomethod that can be implemented instead or in addition to theself_optimizemethod. This method should be used when an optimize method requires to return/output additional information besides the main result and is supported by theOptimizewrapper. (https://github.com/mad-lab-fau/tpcp/pull/49) - Added a new method called
__clone_param__that gives a class control over how params are cloned. This can be helpful, if for some reason objects don't behave well with deepcopy. - Added a new method called
__repr_parameters__that gives a class control over how params are represented. This can be used to customize the representation of individual parameters in the__repr__method. - Add proper repr for
CloneFactory
Scientific Software - Peer-reviewed
- Python
Published by AKuederle about 3 years ago
tpcp - v0.11.0
[0.11.0] - 2022-10-17
Added
- Support for Optuna >3.0
- Example on how to use
attrsanddataclasswith tpcp - Added versions for
DatasetandCustomOptunaOptimizethat work with dataclasses and attrs. - Added first class support for composite objects (e.g. objects that need a list of other objects as parameters). This is basically sklearn pipelines with fewer restrictions (https://github.com/mad-lab-fau/tpcp/pull/48).
Changed
CustomOptunaOptimizenow expects a callable to define the study, instead of taking a study object itself. This ensures that the study objects can be independent when the class is called as part ofcross_validate.- Parameters are only validated when
get_paramsis called. This reduces the reliance on__init_subclass__and that we correctly wrap the init. This makes it possible to easier supportattrsanddataclass
Scientific Software - Peer-reviewed
- Python
Published by AKuederle about 3 years ago
tpcp - v0.10.0
[0.10.0] - 2022-09-09
Changed
- Reworked once again when and how annotations for tpcp classes are processed.
Processing is now delayed until you are actually using the annotations (i.e. as part of the "safe wrappers").
The only user facing change is that the chance of running into edge cases is lower and that
__field_annotations__is now only available on class instances and not the class itself anymore.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle over 3 years ago
tpcp - v0.9.1
[0.9.1] - 2022-09-08
Fixed
- Classes without init can now pass the tpcp checks
Added
- You can nest parameter annotations into
ClassVarand they will still be processed. This is helpful when using dataclasses and annotating nested parameter values.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle over 3 years ago
tpcp - v0.9.0
This release drops Python 3.7 support!
Added
- Bunch new high-level documentation
- Added submission version of JOSS paper
Changed
- The
aggregatemethods of custom aggregators now gets the list of datapoints in additions to the scores. Both parameters are now passed as keyword only arguments.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle over 3 years ago
tpcp - v0.8.0
[0.8.0] - 2022-08-09
Added
- An example on how to use the
dataclassdecorator with tpcp classes. (https://github.com/mad-lab-fau/tpcp/pull/41) - In case you need complex aggregations of scores across data points, you can now wrap the return values of score
functions in custom
Aggregators. The best plac eto learn about this feature is the new "Custom Scorer" example. (https://github.com/mad-lab-fau/tpcp/pull/42) - All crossvalidation based methods now have a new parameter called `mocklabels`. This can be used to provide a "y" value to the split method of a sklearn-cv splitter. This is required e.g. for Stratified KFold splitters. (https://github.com/mad-lab-fau/tpcp/pull/43)
Changed
- Most of the class proccesing and sanity checks now happens in the init (or rather a post init hook) instead of during
class initialisation.
This increases the chance for some edge cases, but allows to post-process classes, before tpcp checks are run.
Most importantly, it allows the use of the
dataclassdecorator in combination with tpcp classes. For the "enduser", this change will have minimal impact. Only, if you relied on accessing special tpcp class parameters before the class (e.g.__field_annotations__) was initialised, you will get an error now. Other than that, you will only notice a very slight overhead on class initialisation, as we know need to run some basic checks when you call the init orget_params. (https://github.com/mad-lab-fau/tpcp/pull/41) - The API of the Scorer class was modified.
In case you used custom Scorer before, they will likely not work anymore.
Further, we removed the
error_scoreparameter from the Scorer and all related methods, that forwarded this parameter (e.g.GridSearch). Error that occur in the score function will now always be raised! If you need special handling of error cases, handle them in your error function yourself (i.e. using try-except). This gives more granular control and makes the implementation of the expected score function returns much easier on thetpcpside. (https://github.com/mad-lab-fau/tpcp/pull/42)
Scientific Software - Peer-reviewed
- Python
Published by AKuederle over 3 years ago
tpcp - v0.7.0
[0.7.0] - 2022-06-23
Added
- The
Datasetclass now has a new parametergroup, which will return the group/row information, if there is only a single group/row left in the dataset. This parameter returns either a string or a namedtuple to make it easy to access the group/row information. - The
Dataset.groupsparameter now returns a list of namedtuples when it previously returned a list of normal tuples. - New
is_single_groupandassert_is_single_groupmethods for theDatasetclass are added. They are shortcuts for callingself.is_single(groupby_cols=self.groupby_cols)andself.assert_is_single(groupby_cols=self.groupby_cols).
Removed
- We removed the
OptimizableAlgorithmbase class, as it is not really useful. We recommend implementing your own base class or mixin if you are implementing a set of algorithms that need a normal and an optimizable version.
Scientific Software - Peer-reviewed
- Python
Published by AKuederle over 3 years ago
tpcp - v0.6.1 - Some bug fixes
Changed
- Fixed bug with tensor hashing (https://github.com/mad-lab-fau/tpcp/pull/37)
- Fixed an issue with memoization during hashing (https://github.com/mad-lab-fau/tpcp/pull/37)
- Fixed an issue that the
safe_optimize_wrappercould not correctly detect changes to mutable objects. This is now fixed by pre-calculating all the hashes. (https://github.com/mad-lab-fau/tpcp/pull/38)
Scientific Software - Peer-reviewed
- Python
Published by AKuederle almost 4 years ago
tpcp - v0.6.0 - Optuna optimizer and pytorch fixes
Added
- A new class to wrap the optimization framework Optuna.
CustomOptunaOptimizecan be used to create custom wrapper classes for various Optuna optimizations, that play nicely withtpcpand can be nested within tpcp operations. (https://github.com/mad-lab-fau/tpcp/pull/27) - A new example for the
CustomOptunaOptimizewrapper that explains how to create complex custom optimizers usingOptunaand the new Scorer callbacks (see below) (https://github.com/mad-lab-fau/tpcp/pull/27) Scorernow supports an optional callback function, which will be called after each datapoint is scored. (https://github.com/mad-lab-fau/tpcp/pull/29)- Pipelines, Optimize objects, and
Scorerare nowGeneric. This improves typing (in particular with VsCode), but means a little bit more typing (pun intended), when creating new Pipelines and Optimizers (https://github.com/mad-lab-fau/tpcp/pull/29) - Added option for scoring function to return arbitrary additional information using the
NoAggwrapper (https://github.com/mad-lab-fau/tpcp/pull/31) - (experimental) Torch compatibility for hash based comparisons (e.g. in the
safe_runwrapper). Before the wrapper would fail, with torch module subclasses, as their pickle based hashes where not consistent. We implemented a custom hash function that should solve this. For now, we will consider this feature experimental, as we are not sure if it breaks in certain use-cases. (https://github.com/mad-lab-fau/tpcp/pull/33) tpcp.typesnow exposes a bunch of internal types that might be helpful to type custom Pipelines and Optimizers. (https://github.com/mad-lab-fau/tpcp/pull/34)
Changed
- The return type for the individual values in the
Scorerclass is notList[float]instead ofnp.ndarray. This also effects the output ofcross_validate,GridSearch.gs_results_andGridSearchCV.cv_results_(https://github.com/mad-lab-fau/tpcp/pull/29) cfnow has "faked" return type, so that type checkers in the user code, do not complain anymore. (https://github.com/mad-lab-fau/tpcp/pull/29)- All TypeVar Variables are now called
SomethingTinstead ofSomething_(https://github.com/mad-lab-fau/tpcp/pull/34)
Scientific Software - Peer-reviewed
- Python
Published by AKuederle almost 4 years ago
tpcp - v0.5.0 - Some features and many docs
[0.5.0] - 2022-03-15
Added
- The
make_optimize_safedecorator (and hence, theOptimizemethod) make use of the parameter annotations to check that only parameters marked asOptimizableParameterare changed by theself_optimizemethod. This check also supports nested parameters, in case the optimization involves optimizing nested objects. (https://github.com/mad-lab-fau/tpcp/pull/9) - All tpcp objects now have a basic representation that is automatically generated based on their parameters (https://github.com/mad-lab-fau/tpcp/pull/13)
- Added algo optimization and evaluation guide and improved docs overall (https://github.com/mad-lab-fau/tpcp/pull/26)
- Added examples for all fundamental concepts (https://github.com/mad-lab-fau/tpcp/pull/23)
Scientific Software - Peer-reviewed
- Python
Published by AKuederle almost 4 years ago
tpcp - v0.4.0 Core Rework (Again!)
Scientific Software - Peer-reviewed
- Python
Published by AKuederle about 4 years ago
tpcp - v0.2.0-alpha.3
Scientific Software - Peer-reviewed
- Python
Published by AKuederle about 4 years ago
tpcp - v0.2.0-alpha.1
Testing release
Scientific Software - Peer-reviewed
- Python
Published by AKuederle about 4 years ago