Changelog¶
0.16.1 - 2026-03-24¶
Fix¶
- Fix Python protobuf dependency
Release music¶
Präludium und Fuge Es-Dur BWV 552, für Orchester gesetzt. Johann Sebastian Bach (arr. Arnold Schönberg)
0.16.0 - 2026-03-16¶
Features¶
- Training on numerical features, esp. when using oblique splits, is now much faster, >60% for some datasets.
- Import Monotonic in the top-level namespace.
- Update Protobuf / Abseil dependencies.
Fix¶
- Fix weighted SHAP for models with hessian splits.
- Fix training logs crash.
Release music¶
Präludium und Fuge Es-Dur BWV 552. Johann Sebastian Bach
0.15.0 - 2026-02-04¶
API Changes¶
- Export to Tensorflow now uses the ydf-tf package instead of TF-DF. Tensorflow Decision Forests is no longer required for exporting to TensorFlow SavedModel.
model.to_tensorflow_saved_model(mode="keras")is now strongly discouraged and will be removed in a future version. This mode still requires Tensorflow Decision Forests (TF-DF) to be installed.
Fix¶
- Fixed compatibility with Pandas 3.0
Release music¶
Allegretto, op. 1. Louis Vierne
0.14.0 - 2026-01-08¶
API Changes¶
- The
Uplifttask is not currently compatible with thehonest=Truehyperparameter. Support for honest uplift trees is not yet implemented. Training an uplift model withhonest=Truewill now raise an error. Please reach out to the YDF team if you have a use case for this functionality. - Removed support for Python 3.8
- Moved Python package and Docker to manylinux_2_28.
Feature¶
- Added new versions of the hyperparameter templates for Random Forests with
winner_take_all=False. The new version generally has better quality - Add
learner.in_bag_example_indices(num_examples, tree_idx)to Random Forest Learner to retrieve the indices of the in-bag samples used for training each tree. - Add export to standalone Java.
- The output_logits option is now explicitly exposed in the Python API for GBT models.
- Added SHAP value computation support for weighted models.
- Added support for integerized numerical features in the Python port.
- Added a new inference engine for QuickScorer based on the Highway SIMD library. This engine provides up to a 70% speedup on AVX3 machines and maintains high performance on modern ARM and AVX2 architectures.
Fixes¶
- Removed Pandas as a hard dependency (it is now optional for dataset ingestion).
- Model descriptions now group multi-dimensional features together for better readability.
- Support for missing features for standalone models.
- Fixed a rare undefined behavior bug in oblique splits caused by incorrect initialization of the binomial distribution.
Release music¶
Sortie en mi bémol majeur, L’Organiste Moderne, vol. 11. Louis James Alfred Lefébure-Wély
0.13.0 - 2025-07-15¶
API Changes¶
- For Random Forest models,
.out_of_bag_evaluations()now returns a TrainingLogs object. The content is identical to the object previously returned, but thenumber_of_treesproperty has been renamed toiterationfor consistency with Gradient Boosted Trees Training Logs. mode="tf"is now the default onmodel.to_tensorflow_saved_model(). The previous default is still available by settingmode="keras".model.label()returns None for models trained without label.- Remove deprecated
evaluation_taskargument formodel.evaluate(). Usetaskinstead.
Feature¶
- Add standalone C++ export with
model.to_standalone_cc(). Standalone models are super flexible, fast and memory-efficient. They only depend on the C++ standard library. - Add
model.training_logs()method to return the training logs of the model. - Expose Mean Average Precision for Ranking tasks.
- Add hyperparameters
numerical_vector_sequence_enable_closer_than_conditionsandnumerical_vector_sequence_enable_projected_more_than_conditions. - Clear error messages when attempting to evaluate models without label.
- Faster training with sparse oblique splits for datasets with many numerical features
- Many documentation improvements.
- Increase default number of threads to 256 or number of CPU cores.
- Enable cross-validation for hyperparameter tuning.
- Add thresholds to classification plots.
- Explicitly disable custom losses for hyperparameter tuning.
- Disable parallel evaluation for cross-validation custom losses.
Fix¶
- Distributed Training: Add
recvmsg: Connection reset to isTransientErrorerror. - Enable SHAP values when training with BEST_FIRST_GLOBAL.
- Predictions with cross-entropy LambdaMART no longer need the slow engine.
- Disable the generic engine for oblique splits without global imputation. This may fix a very rare bug in the way predictions are computed.
Release music¶
Sinfonie Nr. 4 in A-Dur, op. 90. Felix Mendelssohn
0.12.0 - 2025-05-20¶
Feature¶
- Enable support for Python 3.13.
- Add custom fields to model metadata.
- Add SHAP value variable importances with
model.analyze(). - Add SHAP values for a dataset with
model.predict_shap(). - Speed-up (up to 20x) training of models with CATEGORICAL_SET features.
- Add hyper-parameter to limit the mask size for CATEGORICAL_SET features.
- Add hyper-parameter
total_max_num_nodesto limit the total number of nodes in a model. - Add support for na_replacements in python tree editor API.
- Add support for include_all_columns in FeatureSelector.
- Add the
ydf.utils.LogBookto manage and track experiments. - Speed-up training of NDCG ranking model when a single example per group is non-zero.
- Speed-up training on datasets with few columns on a computer with a large amount of cores.
- Speed-up loss computation multi-threading code.
- Improve distributed training error messages.
- Remove need for label columns for deep learning models.
Fix¶
- Log message if early stopping is not used.
- Fix force_numerical_discretization errors and documentation.
- Fix handling of empty list columns in the dataset.
Release music¶
Kindersinfonie (Berchtoldsgaden Musick). Unknown
0.11.0 - 2025-03-12¶
Feature¶
- Expose losses for distributed training.
- Add
class_weightsparameter to the learners. - Support for Google Cloud paths for datasets and model IO.
- Add utility to facilitate distributed training on VertexAI.
- Improved support for non-unicode data in categorical features.
- Add support for saving and analyzing deep models.
Fix¶
- Fix incorrectly transposed confusion table in HTML.
- Various documentation fixes.
- Better requirements management.
Documentation¶
- Add tutorial for Categorical Set features.
- Add tutorial for training on VertexAI.
Release music¶
3. Sinfonie in d-Moll. Gustav Mahler
0.10.0 - 2025-02-11¶
Feature¶
- Expose
model.save(..., pure_serving=True)for saving a model without debug information. - Allow users to provide a training proto configuration to the learner.
- Add vector sequence feature support.
- Add Variable importances for Isolation Forest Models.
- Add
ydf.help.loading_data()to print information about the type of supported dataset formats. - Add experimental Tabular Transformer implementation.
- Add gzipped blob sequence as new model format (still optional).
- Enabled Poisson Loss for model analysis and fast inference.
Fix¶
- Fix recognition of multidimensional features for Numpy arrays of type object.
- Fix subsample count for small number of training examples for Isolation Forests.
- Fix NUM_NODES variable importance for oblique splits.
Other¶
- Updated OSS dependencies of protobuf, grpc and abseil.
Release music¶
Sinfonie in Es-Dur "Sinfonia Eroica", op. 55. Ludwig van Beethoven
0.9.0 - 2024-12-02¶
Feature¶
Breaking¶
- Classification Label classes are now consistently ordered lexicographically (for string labels) or increasingly (for integer labels).
- Change typo partial_depepence_plot to partial_dependence_plot on model.analyze().
Feature¶
- Add support for Avro file for path / distributed training with the "avro:" prefix.
- Add support for discretized numerical features for in-memory datasets.
- Expose MRR for ranking models.
- Add
model.predict_classto generate the most likely predicted class of classification models. - Add support for automatic feature selection with the
feature_selectorlearner constructor argument. See the feature selection tutorial for more details. - Add standalone prediction evaluation
ydf.evaluate_predictions(). - Add new hyperparameter
sparse_oblique_max_num_projections. - Add options "POWER_OF_TWO" and "INTEGER" for sparse oblique weights.
- Emit proper errors when using lists for multi-dimensional features.
Fix¶
- Regression and Ranking CEPs scaling corrected.
Release music¶
The John B. Sails. Traditional
0.8.0 - 2024-09-23¶
Breaking¶
- Disallow positional parameters for the learners, except for label and task.
- Remove the unsupported / invalid hyperparameters from the Isolation Forest learner.
- Remove parameters for distributed training and resuming training from learners that do not support these capabilities.
- By default,
model.analyzefor a maximum of 20 seconds (i.e.maximum_duration=20by default). - Convert boolean values in categorical sets to lowercase, matching the treatment of categorical features.
Feature¶
- Warn if training on a VerticalDataset and fail if attempting to modify the columns in a VerticalDataset during training.
- User can override the model's task, label or group during evaluation.
- Add
num_examples_per_tree()method to Isolation Forest models. - Expose the slow engine for debugging predictions and evaluations with
use_slow_engine=True. - Speed-up training of GBT models by ~10%.
- Support for categorical and boolean features in Isolation Forests.
- Add
ydf.util.read_tf_recordandydf.util.write_tf_recordto facilitate TF Record datasets usage. - Rename LAMBDA_MART_NDCG5 to LAMBDA_MART_NDCG. The old name is deprecated but can still be used.
- Allow configuring the truncation of NDCG losses.
- Enable multi-threading when using
model.predictandmodel.evaluate. - Default number of threads of
model.analyzeis equal to the number of cores. - Add multi-threaded results in
model.benchmark. - Add argument to control the maximum duration of
model.analyze. - Add support for Unicode strings, normalize categorical set values in the same way as categorical values, and validate their types.
- Add support for distributed training for ranking gradient boosted tree models.
Fix¶
- Fix labels of regression evaluation plots
- Improved errors if Isolation Forest training fails.
Release music¶
Perpetuum Mobile "Ein musikalischer Scherz", Op. 257. Johann Strauss (Sohn)
0.7.0 - 2024-08-21¶
Feature¶
- Expose
validate_hyperparameters()on the learner. - Clarify which parameters in the learner are optional.
- Add support in JAX FeatureEncoder for non-string categorical feature values.
- Improve performance of Isolation Forests.
- Models can be serialized/deserialized to/from bytes with
model.serialize()andydf.deserialize_model. - Models can be pickled safely.
- Native support for Xarray as a dataset format for all operations (e.g., training, evaluation, predictions).
- The output of
model.to_jax_functioncan be converted to a TensorFlow Lite model. - Change the default number of examples to scan when training on files to determine the semantic and dictionaries of columns from 10k to 100k.
- Various improvements of error messages.
- Evaluation for Anomaly Detection models.
- Oblique splits for Anomaly Detection models.
- Reduce significatively the RAM usage of distributed training with discretized numerical values.
Fix¶
- Fix parsing of multidimensional ragged inputs.
- Fix isolation forest hyperparameter defaults.
- Fix bug causing distributed training to fail on a sharded dataset containing an empty shard.
- Handle unordered categorical sets in training.
- Fix dataspec ignoring definitions of unrolled columns, such as multidimensional categorical integers.
- Fix error when defining categorical sets for non-ragged multidimensional inputs.
- MacOS: Fix compatibility with other protobuf-using libraries such as Tensorflow.
- Fix parsing of NAs in Xarray datasets.
Release music¶
Rondo Alla ingharese quasi un capriccio "Die Wut über den verlorenen Groschen", Op. 129. Ludwig van Beethoven
0.6.0 - 2024-07-04¶
Feature¶
model.to_jax_functionnow always outputs a FeatureEncoder to help feeding data to the JAX model.- The default value of
num_candidate_attributesin the CART learner is changed from 0 (Random Forest style sampling) to -1 (no sampling). This is the generally accepted logic of CART. model.to_tensorflow_saved_modelsupport preprocessing functions which have a different signature than the YDF model.- Improve error messages when feeding wrong size Numpy arrays.
- Add option for weighted evaluation in
model.evaluate.
Fix¶
- Fix display of confusion matrix with floating point weights.
Known issues¶
- MacOS build is broken.
0.5.0 - 2024-06-17¶
Feature¶
- Add support for Isolation Forests model.
- Add
max_depthargument tomodel.print_tree. - Add
verboseargument totrainmethod which is equivalent but sometime more convenient thanydf.verbose. - Add SKLearn to YDF model converter:
ydf.from_sklearn. - Improve error messages when calling the model with non supported data.
- Add support for numpy 2.0.
Tutorials¶
- Add anomaly detection tutorial.
- Add YDF and JAX model composition tutorial.
Fix¶
- Fix error when plotting oblique trees (
model.plot_tree) in colab.
0.4.3- 2024-05-07¶
Feature¶
- Add
model.to_jax_function()function to convert a YDF model into a JAX function that can be combined with other JAX operations. - Print warnings when categorical features look like numbers.
- Add support for Python 3.12.
Fix¶
- Fix cross-validation for non-classification learners.
- Fix missing ydf/model/tree/plotter.js
- Solve dependency collision of YDF Proto between PYDF and TF-DF.
0.4.2- 2024-04-22¶
Feature¶
- Show error message when TF-DF and YDF shared proto dependencies are colliding.
- Add option to specify a validation dataset for the CartLearner.
DecisionForestsLearneris an alias toCartLearner.
0.4.1- 2024-04-18¶
Fix¶
- Solve dependency collision to YDF between PYDF and TF-DF. If TF-DF is
installed after PYDF, importing YDF will fails with a
has no attribute 'DType'error. - Allow for training on cached TensorFlow dataset.
0.4.0 - 2024-04-10¶
Feature¶
- Multi-dimensional features can be selected / configured with the
features=training argument. - Programmatic access to partial dependence plots and variable importances.
- Add
model.to_tensorflow_function()function to convert a YDF model into a TensorFlow function that can be combined with other TensorFlow operations. This function is compatible with Keras 2 and Keras 3. - Add arguments
servo_api=Falseandfeed_example_proto=Falseformodel.to_tensorflow_function(mode="tf")to export TensorFlow SavedModel following respectively the Servo API and consuming serialized TensorFlow Example protos. - Add
pre_processingandpost_processingarguments to themodel.to_tensorflow_functionfunction to pack pre/post processing operations in a TensorFlow SavedModel.
Tutorials¶
- Add tutorial Vertex AI with TF Serving
- Add tutorial Deep-learning with YDF and TensorFlow
0.3.0 - 2024-03-15¶
Breaking¶
- Custom losses now require to provide the gradient, instead of the negative of the gradient.
- Clarified that YDF may modify numpy arrays returned by a custom loss function.
Features¶
- Allow using Jax for custom loss definitions.
- Allow setting
may_trigger_gcon custom losses. - Add support for MHLD oblique decision trees.
- Expose hyperparameter
sparse_oblique_max_num_projections. - HTML plots for trees with
model.plot_tree(). - Fix protobuf version to 4.24.3 to fix some incompatibilities when using conda.
- Allow to list compatible engines with
model.list_compatible_engines(). - Allow to choose a fast engine with
model.force_engine(...).
Fix¶
- Fix slow engine creation for some combination of oblique splits.
- Improve error message when feeding multi-dimensional labels.
Documentation¶
- Clarified documentation of hyperparameters for oblique splits.
- Fix plots, typos.
Release music¶
Doctor Gradus ad Parnassum from "Children's Corner" (L. 113). Claude Debussy
0.2.0 - 2024-02-22¶
Features¶
- Add support for custom losses for GBTs (Classification & Regression).
Fixes¶
- In the tuner, run
parallel_trialstrials in parallel, instead ofnum_threadstrials in parallel. - Force validation dataset as path if training dataset is a path.
- Fix benchmark.
- Clarify error when using an incompatible dataset when ranking.
- Improve test coverage.
- Fix an issue with the conditions when building trees manually.
Documentation¶
- Add a migration guide TF-DF ==> YDF.
- Clarify max_depth parameter documentation.
- Clarify early stopping documentation.
Release music¶
Galop infernal d'Orphée aux Enfers. Jacques Offenbach
0.1.0 - 2024-01-25¶
Features¶
- Added model validation evaluation (for GBTs) and OOB evaluation (for RFs).
- Expose winner-takes-all for Random Forests.
- Added model self evaluation.
- Added
ydf.from_tensorflow_decision_forests()for importing TF-DF models. - Allow feeding datasets as sequence of strings.
Fixes¶
- Fixes a plotting issue for GBTs without validation loss
Release music¶
Flötenuhren von 1772 und 1793 - Vivace (Hob XIX:13). Joseph Haydn
0.0.8 - 2023-12-19¶
Features¶
- Improve training interruption in Jupyter notebooks.
- Training and inference is compatible with python threading.
- Add the model inspector, which allows to programmatically inspect the structure of trees.
- Add a model editor, which allows to programmatically modify the structure of trees.
Documentation¶
- Many new tutorials at ydf.readthedocs.io
Bug¶
- Fix display of plots in the model analysis.
Release music¶
Les Anges dans nos campagnes. Traditional
0.0.7 - 2023-11-28¶
Features¶
- Support for prediction / evaluation for models with N/A conditions.
- Support for predicting from datasets without label column
- Model inspector
- Automatic inference for Categorical set columns
Release music¶
Knecht Ruprecht aus "Album für die Jugend" (Opus 86). Robert Schumann
0.0.6 - 2023-11-28¶
Features¶
- Add support categorical set features
- Add support for prediction analysis
- Expose max_num_scanned_rows_to_infer_semantic and max_num_scanned_rows_to_compute_statistics
- Expose metadata on the model
- Expose initial predictions on the model
- Add strict mode for more warnings
- Add GBT example distance support
- Support strided numerical features
Documentation¶
- Many new tutorials at ydf.readthedocs.io
Fix¶
- Fix a bug in GBT predictions (see YDF changelog for details)
- Various small fixes
Release music¶
Die Holzauktion. Franz Meißner
0.0.5 - 2023-11-10¶
Features¶
- Add support for distributed training.
- Allow setting node format.
- Allow programmatic access to variable importances
- Support for multi-dimensional numerical features
- Support use of hyper-parameter templates.
Documentation¶
- Many new tutorials at ydf.readthedocs.io
- Updated FAQ
Fix¶
- Fix prediction array for multi-class classification.
- Avoid exposing proto Task type
- No longer expose train_from_path and train_from_dataset -- just use
train! - Correctly choose number of threads if 32 cores are detected.
- Logs were hanging in some configurations.
Release music¶
Auld Lang Syne. Traditional
0.0.4 - 2023-11-01¶
Features¶
- New documentation launched!
- Add support for ranking tasks
- Add support for reading from path, supporting multiple data types (csv, ...)
- Improved model descriptions
- Add support for resuming training
- Add export to Tensorflow
- Control logging verbosity
- Add
model.distance(...)to compute pairwise distance between examples. - Expose data spec on the model
- Support for monotonic constraints
- Simple training benchmark
Fix¶
- Remove pybind11_abseil to avoid clashes with other libraries such as Tensorflow.
Release music¶
Rhapsody in Blue. George Gershwin
0.0.3 - 2023-10-20¶
Features¶
- Hyperparameter tuning
- Export to C++
- Benchmark inference speed
- Tree leaves retrieval
- C++ base updated to 1.7.0
Release music¶
Schweigt stille, plaudert nicht (BWV 211). Johann Sebastian Bach
0.0.2 - 2023-10-05¶
Features¶
- Model analysis
Release music¶
When You're Smiling. Larry Shay, Mark Fisher and Joe Goodwin
0.0.1 - 2023-10-03¶
Initial Alpha Release
Features¶
- Training, prediction, evaluation, Import from Pandas
- Learners: Gradient Boosted Trees (and derivatives), Random Forest (and derivatives), Cart.
Release music¶
Frisch heran (Opus 386). Johann Strauss (Sohn)