Migrating to YDF¶
YDF is Google's new library to train Decision Forests and the successor of TensorFlow Decision Forests.
Both libraries rely on the same high-performance C++ implementation called YDF C++ which have been developed for production since 2018. However, YDF is significantly more feature-rich, efficient, and easier to use than TF-DF. Migrating to YDF will reduce your development and maintenance cost while potentially improving the quality of your model.
Most functions in TF-DF have their equivalent in YDF. The following table shows the mapping.
Note: Many features / functions in YDF do not exist in TF-DF. Therefore, reading the YDF's Getting started guide is likely a good investment of your time.
Note: The migration message in TF-DF can be removed by setting environment variable TFDF_DISABLE_WELCOME_MESSAGE.
Action | TF-DF | YDF |
---|---|---|
Train a model |
tf_ds = tfdf.keras.pd_dataframe_to_tf_dataset(ds, label="l")
model = tfdf.keras.RandomForestModel()
model.fit(tf_ds)
|
model = ydf.GradientBoostedTreesLearner(label="l").train(ds)
|
Look at a model |
model.summary()
|
model.describe()
|
Evaluatea a model |
model.compile(["accuracy", tf.keras.metrics.AUC()])
model.evaluate(test_ds)
|
model.evaluate(ds)
|
Save a model |
model.save("project/model")
|
model.save("project/model")
|
Load a model |
model = tf_keras.models.load_model("project/model")
|
model = ydf.load_model("project/model")
|
Export model as TF SavedModel |
model.save("project/model")
|
model.to_tensorflow_saved_model("project/model", mode="tf")
|
Following is a 1:1 equivalent TF-DF and YDF code.
TF-DF | YDF |
---|---|
!pip install tensorflow tensorflow_decision_forests # Install TF-DF
import tensorflow_decision_forests as tfdf
import tensorflow as tf
import pandas as pd
# Load a dataset with Pandas
train_df = pd.read_csv("train.csv")
test_df = pd.read_csv("test.csv")
# Convert the dataset to a TensorFlow Dataset.
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label="my_label")
test_ds = tfdf.keras.pd_dataframe_to_tf_dataset(test_df, label="my_label")
# Train a model
model = tfdf.keras.RandomForestModel(num_trees=500)
model.fit(train_ds)
# Evaluate model.
model.compile([tf.keras.metrics.SparseCategoricalAccuracy(),tf.keras.metrics.AUC()])
model.evaluate(test_ds)
# Saved model
model.save("/tmp/my_model")
|
pip install ydf # Install YDF
import ydf
import pandas as pd
# Load a dataset with Pandas
train_ds = pd.read_csv("train.csv")
test_ds = pd.read_csv("test.csv")
# Train a model
model = ydf.RandomForestLearner(label="my_label", num_trees=500).train(train_ds)
# Evaluate model.
model.evaluate(test_ds)
# Save the model
model.save("/tmp/my_model")
|