Recent Releases of yggdrasil-decision-forests

yggdrasil-decision-forests - Python API 0.13.0

0.13.0 - 2025-07-15

API Changes

  • For Random Forest models, .out_of_bag_evaluations() now returns a TrainingLogs object. The content is identical to the object previously returned, but the number_of_trees property has been renamed to iteration for consistency with Gradient Boosted Trees Training Logs.
  • mode="tf" is now the default on model.to_tensorflow_saved_model(). The previous default is still available by setting mode="keras".
  • model.label() returns None for models trained without a label.
  • Remove deprecated evaluation_task argument for model.evaluate(). Use task instead.

Feature

  • Add standalone C++ export with model.to_standalone_cc(). Standalone models are super flexible, fast and memory-efficient. They only depend on the C++ standard library.
  • Add model.training_logs() method to return the training logs of the model.
  • Expose Mean Average Precision for Ranking tasks.
  • Add hyperparameters numerical_vector_sequence_enable_closer_than_conditions and numerical_vector_sequence_enable_projected_more_than_conditions.
  • Clear error messages when attempting to evaluate models without label.
  • Faster training with sparse oblique splits for datasets with many numerical features
  • Many documentation improvements.
  • Increase default number of threads to 256 or number of CPU cores.
  • Enable cross-validation for hyperparameter tuning.
  • Add thresholds to classification plots.
  • Explicitly disable custom losses for hyperparameter tuning.
  • Disable parallel evaluation for cross-validation custom losses.

Fix

  • Distributed Training: recvmsg: Connection reset to isTransientError.
  • Enable SHAP values when training with BESTFIRSTGLOBAL.
  • Predictions with cross-entropy LambdaMART no longer need the slow engine.
  • Disable the generic engine for oblique splits without global imputation. This may fix a very rare bug in the way predictions are computed.

Release music

Sinfonie Nr. 4 in A-Dur, op. 90. Felix Mendelssohn

- C++
Published by rstz 7 months ago

yggdrasil-decision-forests - Python API 0.12.0

0.12.0 - 2025-05-20

Feature

  • Enable support for Python 3.13.
  • Add custom fields to model metadata.
  • Add SHAP value variable importances with model.analyze().
  • Add SHAP values for a dataset with model.predict_shap().
  • Speed-up (up to 20x) training of models with CATEGORICAL_SET features.
  • Add hyper-parameter to limit the mask size for CATEGORICAL_SET features.
  • Add hyper-parameter total_max_num_nodes to limit the total number of nodes in a model.
  • Add support for na_replacements in python tree editor API.
  • Add support for includeallcolumns in FeatureSelector.
  • Add the ydf.utils.LogBook to manage and track experiments.
  • Speed-up training of NDCG ranking model when a single example per group is non-zero.
  • Speed-up training on datasets with few columns on a computer with a large amount of cores.
  • Speed-up loss computation multi-threading code.
  • Improve distributed training error messages.
  • Remove need for label columns for deep learning models.

Fix

  • Log message if early stopping is not used.
  • Fix forcenumericaldiscretization errors and documentation.
  • Fix handling of empty list columns in the dataset.

Release music

Te Deum in D major, H.146. Marc-Antoine Charpentier

- C++
Published by rstz 9 months ago

yggdrasil-decision-forests - Python API 0.11.0

0.11.0 - 2025-03-12

Feature

  • Expose losses for distributed training.
  • Add class_weights parameter to the learners.
  • Support for Google Cloud paths for datasets and model IO.
  • Add utility to facilitate distributed training on VertexAI.
  • Improved support for non-unicode data in categorical features.
  • Add support for saving and analyzing deep models.

Fix

  • Fix incorrectly transposed confusion table in HTML.
  • Various documentation fixes.
  • Better requirements management.

Documentation

  • Add tutorial for Categorical Set features.
  • Add tutorial for training on VertexAI.

Release music

3. Sinfonie in d-Moll. Gustav Mahler

- C++
Published by rstz 12 months ago

yggdrasil-decision-forests - v1.11.0

1.11.0 - 2025-03-12

Features

  • Speed-up training of GBT models by ~10%.
  • Support for categorical and boolean features in Isolation Forests.
  • Rename LAMBDAMARTNDCG5 to LAMBDAMARTNDCG. The old name is deprecated but can still be used.
  • Allow configuring the truncation of NDCG losses.
  • Add support for distributed training for ranking gradient boosted tree models.
  • Add support for NUMERICALVECTORSEQUENCE features.
  • Add support for AVRO data file using the "avro:" prefix.
  • Additional hyperparameters restricting weights of sparse oblique splits to integers or powers of 2.
  • Facilitate training on VertexAI.
  • Deprecated SparseObliqueSplit.binary_weights hyperparameter in favor of SparseObliqueSplit.weights.
  • Add Gzip-compressed BLOB_SEQUENCE serialization
  • Enable Poisson loss for model analysis and fast inference.
  • Add config for compatibility with protobuf lite.

Fix

  • Fix structural variable importances for oblique splits.
  • Deflake tests.
  • Remove CHECK/FATAL from training code.
  • Fix crash in YDF distributed training.

Misc

  • Loss options are now defined model/gradientboostedtrees/gradientboostedtrees.proto (previously learner/gradientboostedtrees/gradientboostedtrees.proto)
  • Remove C++14 support.
  • Various documentation improvements.

- C++
Published by rstz 12 months ago

yggdrasil-decision-forests - Python API 0.10.0

0.10.0 - 2025-02-11

Feature

  • Expose model.save(..., pure_serving=True) for saving a model without debug information.
  • Allow users to provide a training proto configuration to the learner.
  • Add vector sequence feature support.
  • Add Variable importances for Isolation Forest Models.
  • Add ydf.help.loading_data() to print information about the type of supported dataset formats.
  • Add experimental Tabular Transformer implementation.
  • Add gzipped blob sequence as new model format (still optional).
  • Enabled Poisson Loss for model analysis and fast inference.

Fix

  • Fix recognition of multidimensional features for Numpy arrays of type object.
  • Fix subsample count for small number of training examples for Isolation Forests.
  • Fix NUM_NODES variable importance for oblique splits.

Other

  • Updated OSS dependencies of protobuf, grpc and abseil.

Release music

  1. Sinfonie in Es-Dur "Sinfonia Eroica", op. 55. Ludwig van Beethoven

- C++
Published by rstz about 1 year ago

yggdrasil-decision-forests - Python API 0.9.0

0.9.0 - 2024-12-02

Breaking

  • Classification Label classes are now consistently ordered lexicographically (for string labels) or increasingly (for integer labels).
  • Change typo partialdepepenceplot to partialdependenceplot on model.analyze().

Feature

  • Add support for Avro file for path / distributed training with the "avro:" prefix.
  • Add support for discretized numerical features for in-memory datasets.
  • Expose MRR for ranking models.
  • Add model.predict_class to generate the most likely predicted class of classification models.
  • Add support for automatic feature selection with the feature_selector learner constructor argument. See the feature selection tutorial for more details.
  • Add standalone prediction evaluation ydf.evaluate_predictions().
  • Add new hyperparameter sparse_oblique_max_num_projections.
  • Add options "POWEROFTWO" and "INTEGER" for sparse oblique weights.
  • Emit proper errors when using lists for multi-dimensional features.

Fix

  • Regression and Ranking CEPs scaling corrected.

Release music

The John B. Sails. Traditional

- C++
Published by rstz about 1 year ago

yggdrasil-decision-forests - Python API 0.8.0

0.8.0 - 2024-09-23

Breaking

  • Disallow positional parameters for the learners, except for label and task.
  • Remove the unsupported / invalid hyperparameters from the Isolation Forest learner.
  • Remove parameters for distributed training and resuming training from learners that do not support these capabilities.
  • By default, model.analyze for a maximum of 20 seconds (i.e. maximum_duration=20 by default).
  • Convert boolean values in categorical sets to lowercase, matching the treatment of categorical features.

Feature

  • Warn if training on a VerticalDataset and fail if attempting to modify the columns in a VerticalDataset during training.
  • User can override the model's task, label or group during evaluation.
  • Add num_examples_per_tree() method to Isolation Forest models.
  • Expose the slow engine for debugging predictions and evaluations with use_slow_engine=True.
  • Speed-up training of GBT models by ~10%.
  • Support for categorical and boolean features in Isolation Forests.
  • Add ydf.util.read_tf_record and ydf.util.write_tf_record to facilitate TF Record datasets usage.
  • Rename LAMBDAMARTNDCG5 to LAMBDAMARTNDCG. The old name is deprecated but can still be used.
  • Allow configuring the truncation of NDCG losses.
  • Enable multi-threading when using model.predict and model.evaluate.
  • Default number of threads of model.analyze is equal to the number of cores.
  • Add multi-threaded results in model.benchmark.
  • Add argument to control the maximum duration of model.analyze.
  • Add support for Unicode strings, normalize categorical set values in the same way as categorical values, and validate their types.
  • Add support for distributed training for ranking gradient boosted tree models.

Fix

  • Fix labels of regression evaluation plots
  • Improved errors if Isolation Forest training fails.

Release music

Perpetuum Mobile "Ein musikalischer Scherz", Op. 257. Johann Strauss (Sohn)

- C++
Published by rstz over 1 year ago

yggdrasil-decision-forests - v1.10.0

1.10.0 - 2024-08-21

Features

  • Add support for Isolation Forests model.
  • The default value of num_candidate_attributes in the CART learner is changed from 0 (Random Forest style sampling) to -1 (no sampling). This is the generally accepted logic of CART.
  • Added support for GCS for file I/O.

- C++
Published by rstz over 1 year ago

yggdrasil-decision-forests - Python API 0.7.0

Python API 0.7.0 - 2024-08-21

Feature

  • Expose validate_hyperparameters() on the learner.
  • Clarify which parameters in the learner are optional.
  • Add support in JAX FeatureEncoder for non-string categorical feature values.
  • Improve performance of Isolation Forests.
  • Models can be serialized/deserialized to/from bytes with model.serialize() and ydf.deserialize_model.
  • Models can be pickled safely.
  • Native support for Xarray as a dataset format for all operations (e.g., training, evaluation, predictions).
  • The output of model.to_jax_function can be converted to a TensorFlow Lite model.
  • Change the default number of examples to scan when training on files to determine the semantic and dictionaries of columns from 10k to 100k.
  • Various improvements of error messages.
  • Evaluation for Anomaly Detection models.
  • Oblique splits for Anomaly Detection models.

Fix

  • Fix parsing of multidimensional ragged inputs.
  • Fix isolation forest hyperparameter defaults.
  • Fix bug causing distributed training to fail on a sharded dataset containing an empty shard.
  • Handle unordered categorical sets in training.
  • Fix dataspec ignoring definitions of unrolled columns, such as multidimensional categorical integers.
  • Fix error when defining categorical sets for non-ragged multidimensional inputs.
  • MacOS: Fix compatibility with other protobuf-using libraries such as Tensorflow.

Release music

Rondo Alla ingharese quasi un capriccio "Die Wut über den verlorenen Groschen", Op. 129. Ludwig van Beethoven

- C++
Published by rstz over 1 year ago

yggdrasil-decision-forests - Python API 0.6.0

Feature

  • model.to_jax_function now always outputs a FeatureEncoder to help feeding data to the JAX model.
  • The default value of num_candidate_attributes in the CART learner is changed from 0 (Random Forest style sampling) to -1 (no sampling). This is the generally accepted logic of CART.
  • model.to_tensorflow_saved_model support preprocessing functions which have a different signature than the YDF model.
  • Improve error messages when feeding wrong size Numpy arrays.
  • Add option for weighted evaluation in model.evaluate.

Fix

  • Fix display of confusion matrix with floating point weights.

Known issues

  • MacOS build is broken.

- C++
Published by rstz over 1 year ago

yggdrasil-decision-forests - Python API 0.5.0

Feature

  • Add support for Isolation Forests model.
  • Add max_depth argument to model.print_tree.
  • Add verbose argument to train method which is equivalent but sometime more convenient thanydf.verbose.
  • Add SKLearn to YDF model converter: ydf.from_sklearn.
  • Improve error messages when calling the model with non supported data.
  • Add support for numpy 2.0.

Tutorials

  • Add anomaly detection tutorial.
  • Add YDF and JAX model composition tutorial.

Fix

  • Fix error when plotting oblique trees (model.plot_tree) in colab.

- C++
Published by achoum over 1 year ago

yggdrasil-decision-forests - Python API 0.4.3

Python API - Changelog

Feature

  • Add model.to_jax_function() function to convert a YDF model into a JAX function that can be combined with other JAX operations.
  • Print warnings when categorical features look like numbers.
  • Add support for Python 3.12.

Fix

  • Fix cross-validation for non-classification learners.
  • Fix missing ydf/model/tree/plotter.js
  • Solve dependency collision of YDF Proto between PYDF and TF-DF.

- C++
Published by achoum almost 2 years ago

yggdrasil-decision-forests - Python API 0.4.1

Python API - Changelog

Fix

  • Solve dependency collision to YDF between PYDF and TF-DF. If TF-DF is installed after PYDF, importing YDF will fails with a has no attribute 'DType' error.
  • Allow for training on cached TensorFlow dataset.

- C++
Published by achoum almost 2 years ago

yggdrasil-decision-forests - Python API 0.4.0

Python API - 0.4.0 - 2024-04-10

Feature

  • Multi-dimensional features can be selected / configured with the features= training argument.
  • Programmatic access to partial dependence plots and variable importances.
  • Add model.to_tensorflow_function() function to convert a YDF model into a TensorFlow function that can be combined with other TensorFlow operations. This function is compatible with Keras 2 and Keras 3.
  • Add arguments servo_api=False and feed_example_proto=False for model.to_tensorflow_function(mode="tf") to export TensorFlow SavedModel following respectively the Servo API and consuming serialized TensorFlow Example protos.
  • Add pre_processing and post_processing arguments to the model.to_tensorflow_function function to pack pre/post processing operations in a TensorFlow SavedModel.

Tutorials

- C++
Published by achoum almost 2 years ago

yggdrasil-decision-forests - Python API 0.3.0

Python API 0.3.0 - 2024-03-15

Breaking

  • Custom losses now require to provide the gradient, instead of the negative of the gradient.
  • Clarified that YDF may modify numpy arrays returned by a custom loss function.

Features

  • Allow using Jax for custom loss definitions.
  • Allow setting may_trigger_gc on custom losses.
  • Add support for MHLD oblique decision trees.
  • Expose hyperparameter sparse_oblique_max_num_projections.
  • HTML plots for trees with model.plot_tree().
  • Fix protobuf version to 4.24.3 to fix some incompatibilities when using conda.
  • Allow to list compatible engines with model.list_compatible_engines().
  • Allow to choose a fast engine with model.force_engine(...).

Fix

  • Fix slow engine creation for some combination of oblique splits.
  • Improve error message when feeding multi-dimensional labels.

Documentation

  • Clarified documentation of hyperparameters for oblique splits.
  • Fix plots, typos.

Release music

Doctor Gradus ad Parnassum from "Children's Corner" (L. 113). Claude Debussy

- C++
Published by rstz almost 2 years ago

yggdrasil-decision-forests - v1.9.0

1.9.0 - 2024-03-12

Feature

  • Add "parallel_trials" parameter in the hyper-parameter tuner to control the number of trials to run in parallel.
  • Add support for custom losses.

- C++
Published by rstz almost 2 years ago

yggdrasil-decision-forests - v1.9.0rc0

1.9.0rc0 - 2024-02-26

Feature

  • Add "parallel_trials" parameter in the hyper-parameter tuner to control the number of trials to run in parallel.
  • Add support for custom losses.

- C++
Published by rstz almost 2 years ago

yggdrasil-decision-forests - PYDF 0.1.0

0.1.0 - 2024-01-25

Features

  • Added model validation evaluation (for GBTs) and OOB evaluation (for RFs).
  • Expose winner-takes-all for Random Forests.
  • Added model self evaluation.
  • Added ydf.from_tensorflow_decision_forests() for importing TF-DF models.
  • Allow feeding datasets as sequence of strings.

Fixes

  • Fixes a plotting issue for GBTs without validation loss

Release music

Flötenuhren von 1772 und 1793 - Vivace (Hob XIX:13). Joseph Haydn

- C++
Published by rstz about 2 years ago

yggdrasil-decision-forests - v1.8.0

1.8.0 - 2023-11-17

Feature

  • Support for GBT distances.
  • Remove old snapshots automatically for GBT training.

Fix

  • Regression with Mean Squared Error loss and Mean Average error loss incorrectly clamped the gradients, leading to incorrect predictions.
  • Change dependency from boost to boost_math for faster builds.

Note

The commit associated with this release has a typo in its description.

1.7.0 - 2023-10-20

Feature

  • Add support for Mean average error (MAE) loss for GBT.
  • Add pairwise distance between examples.
  • By default, only keep the last three snapshots when training with a working cache to be resilient to training interruptions.

New interface

  • Check out the new Python interface in port/python! It's still experimental but you can already install it from PyPi with pip install ydf.

- C++
Published by rstz about 2 years ago

yggdrasil-decision-forests - v1.6.0

Breaking changes

  • The dependency to the distributed gradient boosted trees learner is renamed from //third_party/yggdrasil_decision_forests/learner/distributed_gradient_boosted_trees to //third_party/yggdrasil_decision_forests/learner/distributed_gradient_boosted_trees:dgbt. Note most case, importing the learners with //third_party/yggdrasil_decision_forests/learner:all_learners is recommended.
  • The training configuration must contain a label. A missing label is no longer interpreted as the label being the input feature "".

Feature

  • Add support for monotonic constraints for gradient boosted trees.
  • Improve speed of dataset reading and writing.

Fix

  • Proper error message when using distributed training on more than 2^31 (i.e., ~2B) examples while compiling YDF with 32-bits example index.
  • Fix Window compilation with Visual Studio 2019
  • Improved error messages for invalid training configuration
  • Replaced outdated dependencies

- C++
Published by rstz over 2 years ago

yggdrasil-decision-forests - 1.5.0

Feature

  • Rename experimentalanalyzemodelanddataset to analyzemodeland_dataset
  • Add new GBT loss function POISSON for Poisson log likelihood.
  • Go API: Categorical string values available for inspection.
  • Improved training speed for unit-weight datasets.
  • Support for MHLD oblique decision trees.
  • Multi-threaded RMSE computation.
  • Added Uint8 inference engine.
  • Added Multi-task learning where the output of models trained as "secondary" are used as input for the models trained as "primary"

Fix

  • Go API: fixed typo on OutOfVocabulary constant.
  • Error messages for Uplift models.
  • Remove owner leakage in the model compiler.
  • Fix buggy restriction for SelGB sampling
  • Improve documentation.

- C++
Published by rstz over 2 years ago

yggdrasil-decision-forests - 1.4.0

Features

  • Speed-up the computation of PDP and CEP in the model analysis tool.
  • Add compilation of model into .h file.
  • [JS port] Add "prefix" argument to model loading method.
  • Rename logging function from LOG to YDF_LOG to limit risk of collision with TF or Absl.

Fix

  • [JS port] Fix memory leak. Release emscripten objects.

- C++
Published by achoum almost 3 years ago

yggdrasil-decision-forests - 1.3.0

1.3.0

Features

  • Setting the generic hyper-parameter "subsample" is enough enable random subsampling (to need to also set "sampling_method=RANDOM").
  • Improve the display of decision tree structures.
  • The Hyper-parameter optimizer field "predefinedsearchspace" automatically configures the set of hyper-parameters to explore during automatic hyper-parameter tuning.
  • Replaces the MEANMINDEPTH variable importance with INVMEANMIN_DEPTH.

- C++
Published by achoum about 3 years ago

yggdrasil-decision-forests - 1.2.0

1.2.0 - 2022-11-18

Features

  • YDF can load TF-DF models directly (i.e. a TF model with a YDF model in the "assets" sub directory).
  • Expose confusion tables in a GBT model's analysis.
  • Add the "computevariableimportances" tool to compute variable importances on an already trained model.
  • Add the "experimentalanalyzemodelanddataset" tool to understand/analyze models.

- C++
Published by achoum over 3 years ago

yggdrasil-decision-forests - Yggdrasil Decision Forests 1.1.0

Features

  • Early stopping is no longer triggered during first iterations. The initial iteration for early stopping can be controlled with the new parameter early_stopping_initial_iteration in gradient_boosted_trees.proto.
  • Benchmark inference tool does not require for the dataset to contain the label column.
  • The user can specify the location of the wasm file in the JavaScript port.

- C++
Published by achoum over 3 years ago

yggdrasil-decision-forests - 1.0.0

Yggdrasil Decision Forests 1.0.0

With this release, Yggdrasil Decision Forests finally reaches its first major release 1.0.0 🥳

With this milestone we want to communicate more broadly that Yggdrasil Decision Forests has become a more stable and mature library. In particular, we established more comprehensive testing to make sure that YDF is ready for professional environments.

Features

  • Go (GoLang) inference API (Beta): simple engine written in Go to do inference on YDF and TF-DF models.
  • Creation of html evaluation report with plots (e.g., ROC, PR-ROC).
  • Add support for Random Forest, CART, regressive GBT and Ranking GBT models in the Go API.
  • Add customization of the number of IO threads in the deployment proto.

- C++
Published by rstz over 3 years ago

yggdrasil-decision-forests - 1.0.0rc0

Fix

  • Improved documentation.

- C++
Published by rstz over 3 years ago

yggdrasil-decision-forests - 0.2.5

Features

  • Multi-threading of the oblique splitter for gradient boosted tree models.
  • Support for Javascript + WebAssembly inference of model.
  • Support for pure serving model i.e. model containing only serving data.
  • Add "edit_model" cli tool.

Fix

  • Remove bias toward low outcome in uplift modeling.

- C++
Published by achoum over 3 years ago

yggdrasil-decision-forests - Javascript + WebAssembly RC1 for YDF v0.2.5

Pre-compiled binary for the Javascript + WebAssembly inference library.

Compiled with:

shell bazel build -c opt --config=lto --config=size --config=wasm //yggdrasil_decision_forests/port/javascript:create_release

- C++
Published by achoum over 3 years ago

yggdrasil-decision-forests - 0.2.4

Features

  • Discard hessian splits with score lower than the parents. This change has little effect on the model quality, but it can reduce its size.
  • Add internal flag hessian_split_score_subtract_parent to subtract the parent score in the computation of an hessian split score.
  • Add the hyper-parameter optimizer as one of the meta-learner.
  • The Random Forest and CART learners support the NUMERICAL_UPLIFT task.

- C++
Published by achoum almost 4 years ago

yggdrasil-decision-forests - 0.2.3

Features

  • Honest Random Forests (also work with Gradient Boosted Tree and CART).
  • Can train Random Forests with example sampling without replacement.
  • Add support for Focal Loss in Gradient Boosted Tree learner.

Fixes

  • Incorrect default evaluation of categorical split with uplift tasks. This was making uplift models with missing categorical values perform worst, and made the inference of uplift model possibly slower.

- C++
Published by achoum about 4 years ago

yggdrasil-decision-forests - 0.2.2

Features

  • The CART learner exports the number of pruned nodes in the output model meta-data. Note: The CART learner outputs a Random Forest model with a single tree.
  • The Random Forest and CART learners support the CATEGORICAL_UPLIFT task.
  • Add SetLoggingLevel to control the amount of logging.

Fixes

  • Fix tree pruning in the CART learner for regressive tasks.

- C++
Published by achoum about 4 years ago

yggdrasil-decision-forests - 0.2.0

Features

  • Distributed training of Gradient Boosted Decision Trees.
  • Add maximum_model_size_in_memory_in_bytes hyper-parameter to limit the size of the model in memory.

Fixes

  • Fix invalid splitting of pre-sorted numerical features (make use to use midpoint).

- C++
Published by achoum over 4 years ago

yggdrasil-decision-forests - 0.1.3

Features

  • Register new inference engines.

- C++
Published by achoum almost 5 years ago

yggdrasil-decision-forests - 0.1.2

Features

  • Inference engines: QuickScorer Extended and Pred

- C++
Published by achoum almost 5 years ago

yggdrasil-decision-forests - 0.1.1

Features

  • Migration to TensorFlow 2.5.0.

- C++
Published by achoum almost 5 years ago

yggdrasil-decision-forests - 0.1.0

Release 0.1.0 (2021-05-11)

Initial release of Yggdrasil Decision Forests.

Features

  • CLI: train showmodel showdataspec predict inferdataspec evaluate convertdataset benchmarkinference utils/syntheticdataset.
  • Learners: Gradient Boosted Trees (and derivatives), Random Forest (and derivatives), Cart.

- C++
Published by achoum almost 5 years ago