Recent Releases of yggdrasil-decision-forests
yggdrasil-decision-forests - Python API 0.13.0
0.13.0 - 2025-07-15
API Changes
- For Random Forest models,
.out_of_bag_evaluations()now returns a TrainingLogs object. The content is identical to the object previously returned, but thenumber_of_treesproperty has been renamed toiterationfor consistency with Gradient Boosted Trees Training Logs. -
mode="tf"is now the default onmodel.to_tensorflow_saved_model(). The previous default is still available by settingmode="keras". -
model.label()returns None for models trained without a label. - Remove deprecated
evaluation_taskargument formodel.evaluate(). Usetaskinstead.
Feature
- Add standalone C++ export with
model.to_standalone_cc(). Standalone models are super flexible, fast and memory-efficient. They only depend on the C++ standard library. - Add
model.training_logs()method to return the training logs of the model. - Expose Mean Average Precision for Ranking tasks.
- Add hyperparameters
numerical_vector_sequence_enable_closer_than_conditionsandnumerical_vector_sequence_enable_projected_more_than_conditions. - Clear error messages when attempting to evaluate models without label.
- Faster training with sparse oblique splits for datasets with many numerical features
- Many documentation improvements.
- Increase default number of threads to 256 or number of CPU cores.
- Enable cross-validation for hyperparameter tuning.
- Add thresholds to classification plots.
- Explicitly disable custom losses for hyperparameter tuning.
- Disable parallel evaluation for cross-validation custom losses.
Fix
- Distributed Training:
recvmsg: Connection reset to isTransientError. - Enable SHAP values when training with BESTFIRSTGLOBAL.
- Predictions with cross-entropy LambdaMART no longer need the slow engine.
- Disable the generic engine for oblique splits without global imputation. This may fix a very rare bug in the way predictions are computed.
Release music
Sinfonie Nr. 4 in A-Dur, op. 90. Felix Mendelssohn
- C++
Published by rstz 7 months ago
yggdrasil-decision-forests - Python API 0.12.0
0.12.0 - 2025-05-20
Feature
- Enable support for Python 3.13.
- Add custom fields to model metadata.
- Add SHAP value variable importances with
model.analyze(). - Add SHAP values for a dataset with
model.predict_shap(). - Speed-up (up to 20x) training of models with CATEGORICAL_SET features.
- Add hyper-parameter to limit the mask size for CATEGORICAL_SET features.
- Add hyper-parameter
total_max_num_nodesto limit the total number of nodes in a model. - Add support for na_replacements in python tree editor API.
- Add support for includeallcolumns in FeatureSelector.
- Add the
ydf.utils.LogBookto manage and track experiments. - Speed-up training of NDCG ranking model when a single example per group is non-zero.
- Speed-up training on datasets with few columns on a computer with a large amount of cores.
- Speed-up loss computation multi-threading code.
- Improve distributed training error messages.
- Remove need for label columns for deep learning models.
Fix
- Log message if early stopping is not used.
- Fix forcenumericaldiscretization errors and documentation.
- Fix handling of empty list columns in the dataset.
Release music
Te Deum in D major, H.146. Marc-Antoine Charpentier
- C++
Published by rstz 9 months ago
yggdrasil-decision-forests - Python API 0.11.0
0.11.0 - 2025-03-12
Feature
- Expose losses for distributed training.
- Add
class_weightsparameter to the learners. - Support for Google Cloud paths for datasets and model IO.
- Add utility to facilitate distributed training on VertexAI.
- Improved support for non-unicode data in categorical features.
- Add support for saving and analyzing deep models.
Fix
- Fix incorrectly transposed confusion table in HTML.
- Various documentation fixes.
- Better requirements management.
Documentation
- Add tutorial for Categorical Set features.
- Add tutorial for training on VertexAI.
Release music
3. Sinfonie in d-Moll. Gustav Mahler
- C++
Published by rstz 12 months ago
yggdrasil-decision-forests - v1.11.0
1.11.0 - 2025-03-12
Features
- Speed-up training of GBT models by ~10%.
- Support for categorical and boolean features in Isolation Forests.
- Rename LAMBDAMARTNDCG5 to LAMBDAMARTNDCG. The old name is deprecated but can still be used.
- Allow configuring the truncation of NDCG losses.
- Add support for distributed training for ranking gradient boosted tree models.
- Add support for NUMERICALVECTORSEQUENCE features.
- Add support for AVRO data file using the "avro:" prefix.
- Additional hyperparameters restricting weights of sparse oblique splits to integers or powers of 2.
- Facilitate training on VertexAI.
- Deprecated
SparseObliqueSplit.binary_weightshyperparameter in favor ofSparseObliqueSplit.weights. - Add Gzip-compressed BLOB_SEQUENCE serialization
- Enable Poisson loss for model analysis and fast inference.
- Add config for compatibility with protobuf lite.
Fix
- Fix structural variable importances for oblique splits.
- Deflake tests.
- Remove CHECK/FATAL from training code.
- Fix crash in YDF distributed training.
Misc
- Loss options are now defined model/gradientboostedtrees/gradientboostedtrees.proto (previously learner/gradientboostedtrees/gradientboostedtrees.proto)
- Remove C++14 support.
- Various documentation improvements.
- C++
Published by rstz 12 months ago
yggdrasil-decision-forests - Python API 0.10.0
0.10.0 - 2025-02-11
Feature
- Expose
model.save(..., pure_serving=True)for saving a model without debug information. - Allow users to provide a training proto configuration to the learner.
- Add vector sequence feature support.
- Add Variable importances for Isolation Forest Models.
- Add
ydf.help.loading_data()to print information about the type of supported dataset formats. - Add experimental Tabular Transformer implementation.
- Add gzipped blob sequence as new model format (still optional).
- Enabled Poisson Loss for model analysis and fast inference.
Fix
- Fix recognition of multidimensional features for Numpy arrays of type object.
- Fix subsample count for small number of training examples for Isolation Forests.
- Fix NUM_NODES variable importance for oblique splits.
Other
- Updated OSS dependencies of protobuf, grpc and abseil.
Release music
- Sinfonie in Es-Dur "Sinfonia Eroica", op. 55. Ludwig van Beethoven
- C++
Published by rstz about 1 year ago
yggdrasil-decision-forests - Python API 0.9.0
0.9.0 - 2024-12-02
Breaking
- Classification Label classes are now consistently ordered lexicographically (for string labels) or increasingly (for integer labels).
- Change typo partialdepepenceplot to partialdependenceplot on model.analyze().
Feature
- Add support for Avro file for path / distributed training with the "avro:" prefix.
- Add support for discretized numerical features for in-memory datasets.
- Expose MRR for ranking models.
- Add
model.predict_classto generate the most likely predicted class of classification models. - Add support for automatic feature selection with the
feature_selectorlearner constructor argument. See the feature selection tutorial for more details. - Add standalone prediction evaluation
ydf.evaluate_predictions(). - Add new hyperparameter
sparse_oblique_max_num_projections. - Add options "POWEROFTWO" and "INTEGER" for sparse oblique weights.
- Emit proper errors when using lists for multi-dimensional features.
Fix
- Regression and Ranking CEPs scaling corrected.
Release music
The John B. Sails. Traditional
- C++
Published by rstz about 1 year ago
yggdrasil-decision-forests - Python API 0.8.0
0.8.0 - 2024-09-23
Breaking
- Disallow positional parameters for the learners, except for label and task.
- Remove the unsupported / invalid hyperparameters from the Isolation Forest learner.
- Remove parameters for distributed training and resuming training from learners that do not support these capabilities.
- By default,
model.analyzefor a maximum of 20 seconds (i.e.maximum_duration=20by default). - Convert boolean values in categorical sets to lowercase, matching the treatment of categorical features.
Feature
- Warn if training on a VerticalDataset and fail if attempting to modify the columns in a VerticalDataset during training.
- User can override the model's task, label or group during evaluation.
- Add
num_examples_per_tree()method to Isolation Forest models. - Expose the slow engine for debugging predictions and evaluations with
use_slow_engine=True. - Speed-up training of GBT models by ~10%.
- Support for categorical and boolean features in Isolation Forests.
- Add
ydf.util.read_tf_recordandydf.util.write_tf_recordto facilitate TF Record datasets usage. - Rename LAMBDAMARTNDCG5 to LAMBDAMARTNDCG. The old name is deprecated but can still be used.
- Allow configuring the truncation of NDCG losses.
- Enable multi-threading when using
model.predictandmodel.evaluate. - Default number of threads of
model.analyzeis equal to the number of cores. - Add multi-threaded results in
model.benchmark. - Add argument to control the maximum duration of
model.analyze. - Add support for Unicode strings, normalize categorical set values in the same way as categorical values, and validate their types.
- Add support for distributed training for ranking gradient boosted tree models.
Fix
- Fix labels of regression evaluation plots
- Improved errors if Isolation Forest training fails.
Release music
Perpetuum Mobile "Ein musikalischer Scherz", Op. 257. Johann Strauss (Sohn)
- C++
Published by rstz over 1 year ago
yggdrasil-decision-forests - v1.10.0
1.10.0 - 2024-08-21
Features
- Add support for Isolation Forests model.
- The default value of
num_candidate_attributesin the CART learner is changed from 0 (Random Forest style sampling) to -1 (no sampling). This is the generally accepted logic of CART. - Added support for GCS for file I/O.
- C++
Published by rstz over 1 year ago
yggdrasil-decision-forests - Python API 0.7.0
Python API 0.7.0 - 2024-08-21
Feature
- Expose
validate_hyperparameters()on the learner. - Clarify which parameters in the learner are optional.
- Add support in JAX FeatureEncoder for non-string categorical feature values.
- Improve performance of Isolation Forests.
- Models can be serialized/deserialized to/from bytes with
model.serialize()andydf.deserialize_model. - Models can be pickled safely.
- Native support for Xarray as a dataset format for all operations (e.g., training, evaluation, predictions).
- The output of
model.to_jax_functioncan be converted to a TensorFlow Lite model. - Change the default number of examples to scan when training on files to determine the semantic and dictionaries of columns from 10k to 100k.
- Various improvements of error messages.
- Evaluation for Anomaly Detection models.
- Oblique splits for Anomaly Detection models.
Fix
- Fix parsing of multidimensional ragged inputs.
- Fix isolation forest hyperparameter defaults.
- Fix bug causing distributed training to fail on a sharded dataset containing an empty shard.
- Handle unordered categorical sets in training.
- Fix dataspec ignoring definitions of unrolled columns, such as multidimensional categorical integers.
- Fix error when defining categorical sets for non-ragged multidimensional inputs.
- MacOS: Fix compatibility with other protobuf-using libraries such as Tensorflow.
Release music
Rondo Alla ingharese quasi un capriccio "Die Wut über den verlorenen Groschen", Op. 129. Ludwig van Beethoven
- C++
Published by rstz over 1 year ago
yggdrasil-decision-forests - Python API 0.6.0
Feature
-
model.to_jax_functionnow always outputs a FeatureEncoder to help feeding data to the JAX model. - The default value of
num_candidate_attributesin the CART learner is changed from 0 (Random Forest style sampling) to -1 (no sampling). This is the generally accepted logic of CART. -
model.to_tensorflow_saved_modelsupport preprocessing functions which have a different signature than the YDF model. - Improve error messages when feeding wrong size Numpy arrays.
- Add option for weighted evaluation in
model.evaluate.
Fix
- Fix display of confusion matrix with floating point weights.
Known issues
- MacOS build is broken.
- C++
Published by rstz over 1 year ago
yggdrasil-decision-forests - Python API 0.5.0
Feature
- Add support for Isolation Forests model.
- Add
max_depthargument tomodel.print_tree. - Add
verboseargument totrainmethod which is equivalent but sometime more convenient thanydf.verbose. - Add SKLearn to YDF model converter:
ydf.from_sklearn. - Improve error messages when calling the model with non supported data.
- Add support for numpy 2.0.
Tutorials
- Add anomaly detection tutorial.
- Add YDF and JAX model composition tutorial.
Fix
- Fix error when plotting oblique trees (
model.plot_tree) in colab.
- C++
Published by achoum over 1 year ago
yggdrasil-decision-forests - Python API 0.4.3
Python API - Changelog
Feature
- Add
model.to_jax_function()function to convert a YDF model into a JAX function that can be combined with other JAX operations. - Print warnings when categorical features look like numbers.
- Add support for Python 3.12.
Fix
- Fix cross-validation for non-classification learners.
- Fix missing ydf/model/tree/plotter.js
- Solve dependency collision of YDF Proto between PYDF and TF-DF.
- C++
Published by achoum almost 2 years ago
yggdrasil-decision-forests - Python API 0.4.1
Python API - Changelog
Fix
- Solve dependency collision to YDF between PYDF and TF-DF. If TF-DF is
installed after PYDF, importing YDF will fails with a
has no attribute 'DType'error. - Allow for training on cached TensorFlow dataset.
- C++
Published by achoum almost 2 years ago
yggdrasil-decision-forests - Python API 0.4.0
Python API - 0.4.0 - 2024-04-10
Feature
- Multi-dimensional features can be selected / configured with the
features=training argument. - Programmatic access to partial dependence plots and variable importances.
- Add
model.to_tensorflow_function()function to convert a YDF model into a TensorFlow function that can be combined with other TensorFlow operations. This function is compatible with Keras 2 and Keras 3. - Add arguments
servo_api=Falseandfeed_example_proto=Falseformodel.to_tensorflow_function(mode="tf")to export TensorFlow SavedModel following respectively the Servo API and consuming serialized TensorFlow Example protos. - Add
pre_processingandpost_processingarguments to themodel.to_tensorflow_functionfunction to pack pre/post processing operations in a TensorFlow SavedModel.
Tutorials
- Add tutorial Vertex AI with TF Serving
- Add tutorial Deep-learning with YDF and TensorFlow
- C++
Published by achoum almost 2 years ago
yggdrasil-decision-forests - Python API 0.3.0
Python API 0.3.0 - 2024-03-15
Breaking
- Custom losses now require to provide the gradient, instead of the negative of the gradient.
- Clarified that YDF may modify numpy arrays returned by a custom loss function.
Features
- Allow using Jax for custom loss definitions.
- Allow setting
may_trigger_gcon custom losses. - Add support for MHLD oblique decision trees.
- Expose hyperparameter
sparse_oblique_max_num_projections. - HTML plots for trees with
model.plot_tree(). - Fix protobuf version to 4.24.3 to fix some incompatibilities when using conda.
- Allow to list compatible engines with
model.list_compatible_engines(). - Allow to choose a fast engine with
model.force_engine(...).
Fix
- Fix slow engine creation for some combination of oblique splits.
- Improve error message when feeding multi-dimensional labels.
Documentation
- Clarified documentation of hyperparameters for oblique splits.
- Fix plots, typos.
Release music
Doctor Gradus ad Parnassum from "Children's Corner" (L. 113). Claude Debussy
- C++
Published by rstz almost 2 years ago
yggdrasil-decision-forests - v1.9.0
1.9.0 - 2024-03-12
Feature
- Add "parallel_trials" parameter in the hyper-parameter tuner to control the number of trials to run in parallel.
- Add support for custom losses.
- C++
Published by rstz almost 2 years ago
yggdrasil-decision-forests - v1.9.0rc0
1.9.0rc0 - 2024-02-26
Feature
- Add "parallel_trials" parameter in the hyper-parameter tuner to control the number of trials to run in parallel.
- Add support for custom losses.
- C++
Published by rstz almost 2 years ago
yggdrasil-decision-forests - PYDF 0.1.0
0.1.0 - 2024-01-25
Features
- Added model validation evaluation (for GBTs) and OOB evaluation (for RFs).
- Expose winner-takes-all for Random Forests.
- Added model self evaluation.
- Added
ydf.from_tensorflow_decision_forests()for importing TF-DF models. - Allow feeding datasets as sequence of strings.
Fixes
- Fixes a plotting issue for GBTs without validation loss
Release music
Flötenuhren von 1772 und 1793 - Vivace (Hob XIX:13). Joseph Haydn
- C++
Published by rstz about 2 years ago
yggdrasil-decision-forests - v1.8.0
1.8.0 - 2023-11-17
Feature
- Support for GBT distances.
- Remove old snapshots automatically for GBT training.
Fix
- Regression with Mean Squared Error loss and Mean Average error loss incorrectly clamped the gradients, leading to incorrect predictions.
- Change dependency from boost to boost_math for faster builds.
Note
The commit associated with this release has a typo in its description.
1.7.0 - 2023-10-20
Feature
- Add support for Mean average error (MAE) loss for GBT.
- Add pairwise distance between examples.
- By default, only keep the last three snapshots when training with a working cache to be resilient to training interruptions.
New interface
- Check out the new Python interface in port/python! It's still experimental
but you can already install it from PyPi with
pip install ydf.
- C++
Published by rstz about 2 years ago
yggdrasil-decision-forests - v1.6.0
Breaking changes
- The dependency to the distributed gradient boosted trees learner is renamed
from
//third_party/yggdrasil_decision_forests/learner/distributed_gradient_boosted_treesto//third_party/yggdrasil_decision_forests/learner/distributed_gradient_boosted_trees:dgbt. Note most case, importing the learners with//third_party/yggdrasil_decision_forests/learner:all_learnersis recommended. - The training configuration must contain a label. A missing label is no longer interpreted as the label being the input feature "".
Feature
- Add support for monotonic constraints for gradient boosted trees.
- Improve speed of dataset reading and writing.
Fix
- Proper error message when using distributed training on more than 2^31 (i.e., ~2B) examples while compiling YDF with 32-bits example index.
- Fix Window compilation with Visual Studio 2019
- Improved error messages for invalid training configuration
- Replaced outdated dependencies
- C++
Published by rstz over 2 years ago
yggdrasil-decision-forests - 1.5.0
Feature
- Rename experimentalanalyzemodelanddataset to analyzemodeland_dataset
- Add new GBT loss function
POISSONfor Poisson log likelihood. - Go API: Categorical string values available for inspection.
- Improved training speed for unit-weight datasets.
- Support for MHLD oblique decision trees.
- Multi-threaded RMSE computation.
- Added Uint8 inference engine.
- Added Multi-task learning where the output of models trained as "secondary" are used as input for the models trained as "primary"
Fix
- Go API: fixed typo on OutOfVocabulary constant.
- Error messages for Uplift models.
- Remove owner leakage in the model compiler.
- Fix buggy restriction for SelGB sampling
- Improve documentation.
- C++
Published by rstz over 2 years ago
yggdrasil-decision-forests - 1.4.0
Features
- Speed-up the computation of PDP and CEP in the model analysis tool.
- Add compilation of model into .h file.
- [JS port] Add "prefix" argument to model loading method.
- Rename logging function from LOG to YDF_LOG to limit risk of collision with TF or Absl.
Fix
- [JS port] Fix memory leak. Release emscripten objects.
- C++
Published by achoum almost 3 years ago
yggdrasil-decision-forests - 1.3.0
1.3.0
Features
- Setting the generic hyper-parameter "subsample" is enough enable random subsampling (to need to also set "sampling_method=RANDOM").
- Improve the display of decision tree structures.
- The Hyper-parameter optimizer field "predefinedsearchspace" automatically configures the set of hyper-parameters to explore during automatic hyper-parameter tuning.
- Replaces the MEANMINDEPTH variable importance with INVMEANMIN_DEPTH.
- C++
Published by achoum about 3 years ago
yggdrasil-decision-forests - 1.2.0
1.2.0 - 2022-11-18
Features
- YDF can load TF-DF models directly (i.e. a TF model with a YDF model in the "assets" sub directory).
- Expose confusion tables in a GBT model's analysis.
- Add the "computevariableimportances" tool to compute variable importances on an already trained model.
- Add the "experimentalanalyzemodelanddataset" tool to understand/analyze models.
- C++
Published by achoum over 3 years ago
yggdrasil-decision-forests - Yggdrasil Decision Forests 1.1.0
Features
- Early stopping is no longer triggered during first iterations. The initial
iteration for early stopping can be controlled with the new parameter
early_stopping_initial_iterationingradient_boosted_trees.proto. - Benchmark inference tool does not require for the dataset to contain the label column.
- The user can specify the location of the wasm file in the JavaScript port.
- C++
Published by achoum over 3 years ago
yggdrasil-decision-forests - 1.0.0
Yggdrasil Decision Forests 1.0.0
With this release, Yggdrasil Decision Forests finally reaches its first major release 1.0.0 🥳
With this milestone we want to communicate more broadly that Yggdrasil Decision Forests has become a more stable and mature library. In particular, we established more comprehensive testing to make sure that YDF is ready for professional environments.
Features
- Go (GoLang) inference API (Beta): simple engine written in Go to do inference on YDF and TF-DF models.
- Creation of html evaluation report with plots (e.g., ROC, PR-ROC).
- Add support for Random Forest, CART, regressive GBT and Ranking GBT models in the Go API.
- Add customization of the number of IO threads in the deployment proto.
- C++
Published by rstz over 3 years ago
yggdrasil-decision-forests - 1.0.0rc0
Fix
- Improved documentation.
- C++
Published by rstz over 3 years ago
yggdrasil-decision-forests - 0.2.5
Features
- Multi-threading of the oblique splitter for gradient boosted tree models.
- Support for Javascript + WebAssembly inference of model.
- Support for pure serving model i.e. model containing only serving data.
- Add "edit_model" cli tool.
Fix
- Remove bias toward low outcome in uplift modeling.
- C++
Published by achoum over 3 years ago
yggdrasil-decision-forests - Javascript + WebAssembly RC1 for YDF v0.2.5
Pre-compiled binary for the Javascript + WebAssembly inference library.
Compiled with:
shell
bazel build -c opt --config=lto --config=size --config=wasm //yggdrasil_decision_forests/port/javascript:create_release
- C++
Published by achoum over 3 years ago
yggdrasil-decision-forests - 0.2.4
Features
- Discard hessian splits with score lower than the parents. This change has little effect on the model quality, but it can reduce its size.
- Add internal flag
hessian_split_score_subtract_parentto subtract the parent score in the computation of an hessian split score. - Add the hyper-parameter optimizer as one of the meta-learner.
- The Random Forest and CART learners support the
NUMERICAL_UPLIFTtask.
- C++
Published by achoum almost 4 years ago
yggdrasil-decision-forests - 0.2.3
Features
- Honest Random Forests (also work with Gradient Boosted Tree and CART).
- Can train Random Forests with example sampling without replacement.
- Add support for Focal Loss in Gradient Boosted Tree learner.
Fixes
- Incorrect default evaluation of categorical split with uplift tasks. This was making uplift models with missing categorical values perform worst, and made the inference of uplift model possibly slower.
- C++
Published by achoum about 4 years ago
yggdrasil-decision-forests - 0.2.2
Features
- The CART learner exports the number of pruned nodes in the output model meta-data. Note: The CART learner outputs a Random Forest model with a single tree.
- The Random Forest and CART learners support the
CATEGORICAL_UPLIFTtask. - Add
SetLoggingLevelto control the amount of logging.
Fixes
- Fix tree pruning in the CART learner for regressive tasks.
- C++
Published by achoum about 4 years ago
yggdrasil-decision-forests - 0.2.0
Features
- Distributed training of Gradient Boosted Decision Trees.
- Add
maximum_model_size_in_memory_in_byteshyper-parameter to limit the size of the model in memory.
Fixes
- Fix invalid splitting of pre-sorted numerical features (make use to use midpoint).
- C++
Published by achoum over 4 years ago
yggdrasil-decision-forests - 0.1.3
Features
- Register new inference engines.
- C++
Published by achoum almost 5 years ago
yggdrasil-decision-forests - 0.1.2
Features
- Inference engines: QuickScorer Extended and Pred
- C++
Published by achoum almost 5 years ago
yggdrasil-decision-forests - 0.1.1
Features
- Migration to TensorFlow 2.5.0.
- C++
Published by achoum almost 5 years ago
yggdrasil-decision-forests - 0.1.0
Release 0.1.0 (2021-05-11)
Initial release of Yggdrasil Decision Forests.
Features
- CLI: train showmodel showdataspec predict inferdataspec evaluate convertdataset benchmarkinference utils/syntheticdataset.
- Learners: Gradient Boosted Trees (and derivatives), Random Forest (and derivatives), Cart.
- C++
Published by achoum almost 5 years ago