Skip to content

YDF is a library to train, evaluate, interpret, and serve Random Forest,
Gradient Boosted Decision Trees, and CART decision forest models.

Getting Started 🧭

A concise and modern API
YDF allows for for rapid prototyping and development while minimizing risk of modeling errors.
Deep learning composition
Integrated with TensorFlow, Keras, and Vertex AI.
Cutting-edge algorithms
Include the latest decision forest research to ensure maximum performance.
Fast inference
Compute predictions in a few microseconds. Executed in the tens of millions of times per second in Google.

Key features

Read our KDD 2023 paper: Yggdrasil Decision Forests: A Fast and Extensible Decision Forests Library. YDF is developed by Google since 2018 and powers TensorFlow Decision Forests.

Modeling

Serving

Advanced modeling

  • Model composition with TensorFlow, Keras, and Jax (coming soon).
  • Distributed training over billions of examples and hundreds of machines.
  • Cutting-edge learning algorithm such as oblique splits, honest trees, hessian scores, global tree optimizations, optimal categorical splits, categorical-set inputs, dart, extremely randomized trees.
  • Monotonic constraints.
  • Consumes multi-dimensional features.
  • Backward compatibility for model and learners since 2018.
  • Edits trees in Python.
  • Custom loss in Python.

Installation

To install YDF from PyPI, run:

pip install ydf -U

Usage example

Open in Colab

import ydf
import pandas as pd

# Load dataset with Pandas
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset/"
train_ds = pd.read_csv(ds_path + "adult_train.csv")
test_ds = pd.read_csv(ds_path + "adult_test.csv")

# Train a Gradient Boosted Trees model
model = ydf.GradientBoostedTreesLearner(label="income").train(train_ds)

# Look at a model (input features, training logs, structure, etc.)
model.describe()

# Evaluate a model (e.g. roc, accuracy, confusion matrix, confidence intervals)
model.evaluate(test_ds)

# Generate predictions
model.predict(test_ds)

# Analyse a model (e.g. partial dependence plot, variable importance)
model.analyze(test_ds)

# Benchmark the inference speed of a model
model.benchmark(test_ds)

# Save the model
model.save("/tmp/my_model")

Next steps

Read the 🧭 Getting Started tutorial. You will learn how to train a model, interpret it, evaluate it, generate predictions, benchmark its speed, and export it for serving.

Ask us questions on Github. Googlers can join the internal YDF Chat.

Read the TF-DF to YDF Migration guide to convert a TensorFlow Decision Forests pipeline into a YDF pipeline.