YDF documentation

YDF is a library to train, evaluate, interpret, and serve Random Forest,
Gradient Boosted Decision Trees, and CART decision forest models.

Getting Started 🧭

A concise and modern API

YDF allows for for rapid prototyping and development while minimizing risk of modeling errors.

Deep learning composition

Integrated with TensorFlow, Keras, and Vertex AI.

Cutting-edge algorithms

Include the latest decision forest research to ensure maximum performance.

Source: Yggdrasil Decision Forests: A Fast and Extensible Decision Forests Library (KDD 2023).

Source: A Comparison of Decision Forest Inference Platforms from A Database Perspective (Arxiv 2023).

Fast inference

Compute predictions in a few microseconds. Executed in the tens of millions of times per second in Google.

Key features

Read our KDD 2023 paper: Yggdrasil Decision Forests: A Fast and Extensible Decision Forests Library. YDF is developed by Google since 2018 and powers TensorFlow Decision Forests.

Modeling

Train Random Forest, Gradient Boosted Trees and Cart models.
Train classification, regression, ranking, and uplifting models.
Plotting of decision trees.
Model interpretation (variable importances, partial dependence plots, conditional dependence plots).
Prediction interpretation (counter factual, feature variation).
Model evaluation (accuracy, AUC, ROC plots, RMSE, confidence intervals, cross-validation).
Model hyper-parameter tuning.
Consume natively numerical, categorical, boolean, tags, text, and missing values.
Consume natively Pandas Dataframe, Numpy Arrays, TensorFlow Datasets, CSV files and TensorFlow Records.

Serving

Benchmark model inference.
Call models in Python, C++, Go, JavaScript, and CLI.
Online inference with REST API with TensorFlow Serving and Vertex AI.

Advanced modeling

Model composition with TensorFlow, Keras, and Jax (coming soon).
Distributed training over billions of examples and hundreds of machines.
Cutting-edge learning algorithm such as oblique splits, honest trees, hessian scores, global tree optimizations, optimal categorical splits, categorical-set inputs, dart, extremely randomized trees.
Monotonic constraints.
Consumes multi-dimensional features.
Backward compatibility for model and learners since 2018.
Edits trees in Python.
Custom loss in Python.

Installation

To install YDF from PyPI, run:

pip install ydf -U

Usage example

import ydf
import pandas as pd

# Load dataset with Pandas
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset/"
train_ds = pd.read_csv(ds_path + "adult_train.csv")
test_ds = pd.read_csv(ds_path + "adult_test.csv")

# Train a Gradient Boosted Trees model
model = ydf.GradientBoostedTreesLearner(label="income").train(train_ds)

# Look at a model (input features, training logs, structure, etc.)
model.describe()

# Evaluate a model (e.g. roc, accuracy, confusion matrix, confidence intervals)
model.evaluate(test_ds)

# Generate predictions
model.predict(test_ds)

# Analyse a model (e.g. partial dependence plot, variable importance)
model.analyze(test_ds)

# Benchmark the inference speed of a model
model.benchmark(test_ds)

# Save the model
model.save("/tmp/my_model")

Next steps

Read the 🧭 Getting Started tutorial. You will learn how to train a model, interpret it, evaluate it, generate predictions, benchmark its speed, and export it for serving.

Ask us questions on Github. Googlers can join the internal YDF Chat.

Read the TF-DF to YDF Migration guide to convert a TensorFlow Decision Forests pipeline into a YDF pipeline.