IsolationForestModel
 IsolationForestModel
 add_tree
 analyze
 analyze_prediction
 benchmark
 data_spec
 describe
 distance
 evaluate
 force_engine
 get_all_trees
 get_tree
 hyperparameter_optimizer_logs
 input_feature_names
 input_features
 iter_trees
 label
 label_classes
 list_compatible_engines
 metadata
 name
 num_examples_per_tree
 num_trees
 plot_tree
 predict
 predict_leaves
 print_tree
 remove_tree
 save
 self_evaluation
 serialize
 set_metadata
 set_node_format
 set_tree
 task
 to_cpp
 to_docker
 to_jax_function
 to_tensorflow_function
 to_tensorflow_saved_model
 update_with_jax_params
 variable_importances
IsolationForestModel
Bases: DecisionForestModel
An Isolation Forest model for prediction and inspection.
add_tree
add_tree(tree: Tree) > None
Adds a single tree of the model.
Parameters:
Name  Type  Description  Default 

tree

Tree

New tree. 
required 
analyze
analyze(data: InputDataset, sampling: float = 1.0, num_bins: int = 50, partial_depepence_plot: bool = True, conditional_expectation_plot: bool = True, permutation_variable_importance_rounds: int = 1, num_threads: Optional[int] = None, maximum_duration: Optional[float] = 20) > Analysis
Analyzes a model on a test dataset.
An analysis contains structual information about the model (e.g., variable importances), and the information about the application of the model on the given dataset (e.g. partial dependence plots).
For a large dataset (many examples and / or features), computing the analysis can take significant time.
While some information might be valid, it is generatly not recommended to analyze a model on its training dataset.
Usage example:
import pandas as pd
import ydf
# Train model
train_ds = pd.read_csv("train.csv")
model = ydf.RandomForestLearner(label="label").train(train_ds)
test_ds = pd.read_csv("test.csv")
analysis = model.analyze(test_ds)
# Display the analysis in a notebook.
analysis
Parameters:
Name  Type  Description  Default 

data

InputDataset

Dataset. Supported formats: VerticalDataset, (typed) path, list of (typed) paths, Pandas DataFrame, Xarray Dataset, TensorFlow Dataset, PyGrain DataLoader and Dataset (experimental, Linux only), dictionary of string to NumPy array or lists. 
required 
sampling

float

Ratio of examples to use for the analysis. The analysis can be expensive to compute. On large datasets, use a small sampling value e.g. 0.01. 
1.0

num_bins

int

Number of bins used to accumulate statistics. A large value increase the resolution of the plots but takes more time to compute. 
50

partial_depepence_plot

bool

Compute partial dependency plots a.k.a PDPs. Expensive to compute. 
True

conditional_expectation_plot

bool

Compute the conditional expectation plots a.k.a. CEP. Cheap to compute. 
True

permutation_variable_importance_rounds

int

If >1, computes permutation variable importances using "permutation_variable_importance_rounds" rounds. The most rounds the more accurate the results. Using a single round is often acceptable i.e. permutation_variable_importance_rounds=1. If permutation_variable_importance_rounds=0, disables the computation of permutation variable importances. 
1

num_threads

Optional[int]

Number of threads to use to compute the analysis. 
None

maximum_duration

Optional[float]

Maximum duration of the analysis in seconds. Note that the analysis can last a little longer than this value. 
20

Returns:
Type  Description 

Analysis

Model analysis. 
analyze_prediction
Understands a single prediction of the model.
Note: To explain the model as a whole, use model.analyze
instead.
Usage example:
import pandas as pd
import ydf
# Train model
train_ds = pd.read_csv("train.csv")
model = ydf.RandomForestLearner(label="label").train(train_ds)
test_ds = pd.read_csv("test.csv")
# We want to explain the model prediction on the first test example.
selected_example = test_ds.iloc[:1]
analysis = model.analyze_prediction(selected_example, test_ds)
# Display the analysis in a notebook.
analysis
Parameters:
Name  Type  Description  Default 

single_example

InputDataset

Example to explain. Supported formats: VerticalDataset, (typed) path, list of (typed) paths, Pandas DataFrame, Xarray Dataset, TensorFlow Dataset, PyGrain DataLoader and Dataset (experimental, Linux only), dictionary of string to NumPy array or lists. 
required 
Returns:
Type  Description 

PredictionAnalysis

Prediction explanation. 
benchmark
benchmark(ds: InputDataset, benchmark_duration: float = 3, warmup_duration: float = 1, batch_size: int = 100, num_threads: Optional[int] = None) > BenchmarkInferenceCCResult
Benchmark the inference speed of the model on the given dataset.
This benchmark creates batched predictions on the given dataset using the
C++ API of Yggdrasil Decision Forests. Note that inference times using other
APIs or on different machines will be different. A serving template for the
C++ API can be generated with model.to_cpp()
.
Parameters:
Name  Type  Description  Default 

ds

InputDataset

Dataset to perform the benchmark on. 
required 
benchmark_duration

float

Total duration of the benchmark in seconds. Note that this number is only indicative and the actual duration of the benchmark may be shorter or longer. This parameter must be > 0. 
3

warmup_duration

float

Total duration of the warmup runs before the benchmark in seconds. During the warmup phase, the benchmark is run without being timed. This allows warming up caches. The benchmark will always run at least one batch for warmup. This parameter must be > 0. batch_size: Size of batches when feeding examples to the inference engines. The impact of this parameter on the results depends on the architecture running the benchmark (notably, cache sizes). num_threads: Number of threads used for the multithreaded benchmark. If not specified, the number of threads is set to the number of cpu cores. 
1

Returns:
Type  Description 

BenchmarkInferenceCCResult

Benchmark results. 
describe
describe(output_format: Literal['auto', 'text', 'notebook', 'html'] = 'auto', full_details: bool = False) > Union[str, HtmlNotebookDisplay]
Description of the model.
Parameters:
Name  Type  Description  Default 

output_format

Literal['auto', 'text', 'notebook', 'html']

Format of the display:  auto: Use the "notebook" format if executed in an IPython notebook / Colab. Otherwise, use the "text" format.  text: Text description of the model.  html: Html description of the model.  notebook: Html description of the model displayed in a notebook cell. 
'auto'

full_details

bool

Should the full model be printed. This can be large. 
False

Returns:
Type  Description 

Union[str, HtmlNotebookDisplay]

The model description. 
distance
distance(data1: InputDataset, data2: Optional[InputDataset] = None) > ndarray
Computes the pairwise distance between examples in "data1" and "data2".
If "data2" is not provided, computes the pairwise distance between examples in "data1".
Usage example:
import pandas as pd
import ydf
# Train model
train_ds = pd.read_csv("train.csv")
model = ydf.RandomForestLearner(label="label").Train(train_ds)
test_ds = pd.read_csv("test.csv")
distances = model.distance(test_ds, train_ds)
# "distances[i,j]" is the distance between the ith test example and the
# jth train example.
Different models are free to implement different distances with different definitions. For this reasons, unless indicated by the model, distances from different models cannot be compared.
The distance is not guaranteed to satisfy the triangular inequality property of metric distances.
Not all models can compute distances. In this case, this function will raise an Exception.
Parameters:
Name  Type  Description  Default 

data1

InputDataset

Dataset. Can be a dictionary of list or numpy array of values, Pandas DataFrame, or a VerticalDataset. 
required 
data2

Optional[InputDataset]

Dataset. Can be a dictionary of list or numpy array of values, Pandas DataFrame, or a VerticalDataset. 
None

Returns:
Type  Description 

ndarray

Pairwise distance 
evaluate
evaluate(data: InputDataset, *, weighted: Optional[bool] = None, task: Optional[Task] = None, label: Optional[str] = None, group: Optional[str] = None, bootstrapping: Union[bool, int] = False, ndcg_truncation: int = 5, evaluation_task: Optional[Task] = None, use_slow_engine: bool = False, num_threads: Optional[int] = None) > Evaluation
Evaluates the quality of a model on a dataset.
Usage example:
import pandas as pd
import ydf
# Train model
train_ds = pd.read_csv("train.csv")
model = ydf.RandomForestLearner(label="label").train(train_ds)
test_ds = pd.read_csv("test.csv")
evaluation = model.evaluates(test_ds)
In a notebook, if a cell returns an evaluation object, this evaluation will be as a rich html with plots:
evaluation = model.evaluate(test_ds)
# If model is an anomaly detection model:
# evaluation = model.evaluate(test_ds,
evaluation_task=ydf.Task.CLASSIFICATION)
evaluation
It is possible to evaluate the model differently than it was trained. For example, you can change the label, task and group.
...
# Train a regression model
model = ydf.RandomForestLearner(label="label",
task=ydf.Task.REGRESSION).train(train_ds)
# Evaluate the model as a regressive model
regressive_evaluation = model.evaluates(test_ds)
# Evaluate the model as a ranking model model
regressive_evaluation = model.evaluates(test_ds,
task=ydf.Task.RANKING, group="group_column")
Parameters:
Name  Type  Description  Default 

data

InputDataset

Dataset. Supported formats: VerticalDataset, (typed) path, list of (typed) paths, Pandas DataFrame, Xarray Dataset, TensorFlow Dataset, PyGrain DataLoader and Dataset (experimental, Linux only), dictionary of string to NumPy array or lists. 
required 
weighted

Optional[bool]

If true, the evaluation is weighted according to the training weights. If false, the evaluation is nonweighted. b/351279797: Change default to weights=True. 
None

task

Optional[Task]

Override the task of the model during the evaluation. If None (default), the model is evaluated according to its training task. 
None

label

Optional[str]

Override the label used to evaluate the model. If None (default), use the model's label. 
None

group

Optional[str]

Override the group used to evaluate the model. If None (default), use the model's group. Only used for ranking models. 
None

bootstrapping

Union[bool, int]

Controls whether bootstrapping is used to evaluate the confidence intervals and statistical tests (i.e., all the metrics ending with "[B]"). If set to false, bootstrapping is disabled. If set to true, bootstrapping is enabled and 2000 bootstrapping samples are used. If set to an integer, it specifies the number of bootstrapping samples to use. In this case, if the number is less than 100, an error is raised as bootstrapping will not yield useful results. 
False

ndcg_truncation

int

Controls at which ranking position the NDCG loss should be truncated. Default to 5. Ignored for nonranking models. 
5

evaluation_task

Optional[Task]

Deprecated. Use 
None

use_slow_engine

bool

If true, uses the slow engine for making predictions. The slow engine of YDF is an order of magnitude slower than the other prediction engines. There exist very rare edge cases where predictions with the regular engines fail, e.g., models with a very large number of categorical conditions. It is only in these cases, that users should use the slow engine and report the issue to the YDF developers. 
False

num_threads

Optional[int]

Number of threads used to run the model. 
None

Returns:
Type  Description 

Evaluation

Model evaluation. 
force_engine
Forces the engines used by the model.
If not specified (i.e., None; default value), the fastest compatible engine (i.e., the first value returned from "list_compatible_engines") is used for all model inferences (e.g., model.predict, model.evaluate).
If passing a nonexisting or noncompatible engine, the next model inference (e.g., model.predict, model.evaluate) will fail.
Parameters:
Name  Type  Description  Default 

engine_name

Optional[str]

Name of a compatible engine or None to automatically select the fastest engine. 
required 
get_tree
hyperparameter_optimizer_logs
hyperparameter_optimizer_logs() > Optional[OptimizerLogs]
Returns the logs of the hyperparameter tuning.
If the model is not trained with hyperparameter tuning, returns None.
input_feature_names
Returns the names of the input features.
The features are sorted in increasing order of column_idx.
input_features
input_features() > Sequence[InputFeature]
Returns the input features of the model.
The features are sorted in increasing order of column_idx.
label_classes
Returns the label classes for a classification model; fails otherwise.
list_compatible_engines
metadata
metadata() > ModelMetadata
Metadata associated with the model.
A model's metadata contains information stored with the model that does not
influence the model's predictions (e.g. data created). When distributing a
model for wide release, it may be useful to clear / modify the model
metadata with model.set_metadata(ydf.ModelMetadata())
.
Returns:
Type  Description 

ModelMetadata

The model's metadata. 
num_examples_per_tree
num_examples_per_tree() > int
Returns the number of examples used to grow each tree.
plot_tree
plot_tree(tree_idx: int = 0, max_depth: Optional[int] = None, options: Optional[PlotOptions] = None, d3js_url: str = 'https://d3js.org/d3.v6.min.js') > TreePlot
Plots an interactive HTML rendering of the tree.
Usage example:
# Create a dataset
train_ds = pd.DataFrame({
"c1": [1.0, 1.1, 2.0, 3.5, 4.2] + list(range(10)),
"label": ["a", "b", "b", "a", "a"] * 3,
})
# Train a CART model
model = ydf.CartLearner(label="label").train(train_ds)
# Make sure the model is a CART
assert isinstance(model, ydf.CARTModel)
# Plot the tree in Colab
model.plot_tree()
Parameters:
Name  Type  Description  Default 

tree_idx

int

Index of the tree. Should be in [0, self.num_trees()). 
0

max_depth

Optional[int]

Maximum tree depth of the plot. Set to None for full depth. 
None

options

Optional[PlotOptions]

Advanced options for plotting. Set to None for default style. 
None

d3js_url

str

URL to load the d3.js library from. 
'https://d3js.org/d3.v6.min.js'

Returns:
Type  Description 

TreePlot

In interactive environments, an interactive plot. The HTML source can also 
TreePlot

be exported to file. 
predict
Returns the predictions of the model on the given dataset.
Usage example:
import pandas as pd
import ydf
# Train model
train_ds = pd.read_csv("train.csv")
model = ydf.RandomForestLearner(label="label").train(train_ds)
test_ds = pd.read_csv("test.csv")
predictions = model.predict(test_ds)
Parameters:
Name  Type  Description  Default 

data

InputDataset

Dataset. Supported formats: VerticalDataset, (typed) path, list of (typed) paths, Pandas DataFrame, Xarray Dataset, TensorFlow Dataset, PyGrain DataLoader and Dataset (experimental, Linux only), dictionary of string to NumPy array or lists. If the dataset contains the label column, that column is ignored. 
required 
use_slow_engine

If true, uses the slow engine for making predictions. The slow engine of YDF is an order of magnitude slower than the other prediction engines. There exist very rare edge cases where predictions with the regular engines fail, e.g., models with a very large number of categorical conditions. It is only in these cases, that users should use the slow engine and report the issue to the YDF developers. 
False


num_threads

Optional[int]

Number of threads used to run the model. 
None

predict_leaves
Gets the index of the active leaf in each tree.
The active leaf is the leave that that receive the example during inference.
The returned value "leaves[i,j]" is the index of the active leaf for the ith example and the jth tree. Leaves are indexed by depth first exploration with the negative child visited before the positive one.
Parameters:
Name  Type  Description  Default 

data

InputDataset

Dataset. 
required 
Returns:
Type  Description 

ndarray

Index of the active leaf for each tree in the model. 
print_tree
Prints a tree in the terminal.
Usage example:
# Create a dataset
train_ds = pd.DataFrame({
"c1": [1.0, 1.1, 2.0, 3.5, 4.2] + list(range(10)),
"label": ["a", "b", "b", "a", "a"] * 3,
})
# Train a CART model
model = ydf.CartLearner(label="label").train(train_ds)
# Make sure the model is a CART
assert isinstance(model, ydf.CARTModel)
# Print the tree
model.print_tree()
Parameters:
Name  Type  Description  Default 

tree_idx

int

Index of the tree. Should be in [0, self.num_trees()). 
0

max_depth

Optional[int]

Maximum tree depth of the plot. Set to None for full depth. 
6

file

Where to print the tree. By default, prints on the terminal standard output. 
stdout

remove_tree
remove_tree(tree_idx: int) > None
Removes a single tree of the model.
Parameters:
Name  Type  Description  Default 

tree_idx

int

Index of the tree. Should be in [0, num_trees()). 
required 
save
save(path, advanced_options=ModelIOOptions()) > None
Save the model to disk.
YDF uses a proprietary model format for saving models. A model consists of
multiple files located in the same directory.
A directory should only contain a single YDF model. See advanced_options
for more information.
YDF models can also be exported to other formats, see
to_tensorflow_saved_model()
and to_cpp()
for details.
YDF saves some metadata inside the model, see model.metadata()
for
details. Before distributing a model to the world, consider removing
metadata with model.set_metadata(ydf.ModelMetadata())
.
Usage example:
import pandas as pd
import ydf
# Train a Random Forest model
df = pd.read_csv("my_dataset.csv")
model = ydf.RandomForestLearner().train(df)
# Save the model to disk
model.save("/models/my_model")
Parameters:
Name  Type  Description  Default 

path

Path to directory to store the model in. 
required  
advanced_options

Advanced options for saving models. 
ModelIOOptions()

self_evaluation
Returns the model's selfevaluation.
Different models use different methods for selfevaluation. Notably, Random Forests use OOB evaluation and Gradient Boosted Trees use evaluation on the validation dataset. Therefore, selfevaluations are not comparable between different model types.
Usage example:
serialize
serialize() > bytes
Serializes a model to a sequence of bytes (i.e. bytes
).
A serialized model is equivalent to model saved with model.save
. It can
possibly contain metadata related to model training and interpretation. To
minimize the size of a serialized model, removes this metadata by passing
the argument pure_serving_model=True
to the train
method.
Usage example:
import pandas as pd
import ydf
# Create a model
dataset = pd.DataFrame({"feature": [0, 1], "label": [0, 1]})
learner = ydf.RandomForestLearner(label="label")
model = learner.train(dataset)
# Serialize model
# Note: serialized_model is a bytes.
serialized_model = model.serialize()
# Deserialize model
deserialized_model = ydf.deserialize_model(serialized_model)
# Make predictions
model.predict(dataset)
deserialized_model.predict(dataset)
Returns:
Type  Description 

bytes

The serialized model. 
set_node_format
set_node_format(node_format: NodeFormat) > None
Set the serialization format for the nodes.
Parameters:
Name  Type  Description  Default 

node_format

NodeFormat

Node format to use when saving the model. 
required 
set_tree
to_cpp
Generates the code of a .h file to run the model in C++.
How to use this function:
 Copy the output of this function in a new .h file. open("model.h", "w").write(model.to_cpp())
 If you use Bazel/Blaze, create a rule with the dependencies: //third_party/absl/status:statusor //third_party/absl/strings //external/ydf_cc/yggdrasil_decision_forests/api:serving
 In your C++ code, include the .h file and call the model with:
// Load the model (to do only once).
namespace ydf = yggdrasil_decision_forests;
const auto model = ydf::exported_model_123::Load(
); // Run the model predictions = model.Predict();  The generated "Predict" function takes no inputs. Instead, it fills the input features with placeholder values. Therefore, you will want to add your input as arguments to the "Predict" function, and use it to populate the "examples>Set..." section accordingly.
 (Bonus) You can further optimize the inference speed by preallocating and reusing the examples and predictions for each thread running the model.
This documentation is also available in the header of the generated content for more details.
Parameters:
Name  Type  Description  Default 

key

str

Name of the model. Used to define the c++ namespace of the model. 
'my_model'

Returns:
Type  Description 

str

String containing an example header for running the model in C++. 
to_docker
Exports the model to a Docker endpoint deployable on Cloud.
This function creates a directory containing a Dockerfile, the model and support files.
Usage example:
import ydf
# Train a model.
model = ydf.RandomForestLearner(label="l").train({
"f1": np.random.random(size=100),
"f2": np.random.random(size=100),
"l": np.random.randint(2, size=100),
})
# Export the model to a Docker endpoint.
model.to_docker(path="/tmp/my_model")
# Print instructions on how to use the model
!cat /tmp/my_model/readme.md
# Test the endpoint locally
docker build platform linux/amd64 t ydf_predict_image /tmp/my_model
docker run rm p 8080:8080 d ydf_predict_image
# Deploy the model on Google Cloud
gcloud run deploy ydfpredict source /tmp/my_model
# Check the automatically created utility scripts "test_locally.sh" and
# "deploy_in_google_cloud.sh" for more examples.
Parameters:
Name  Type  Description  Default 

path

str

Directory where to create the Docker endpoint 
required 
exist_ok

bool

If false (default), fails if the directory already exist. If true, override the directory content if any. 
False

to_jax_function
to_jax_function(jit: bool = True, apply_activation: bool = True, leaves_as_params: bool = False, compatibility: Union[str, Compatibility] = 'XLA') > JaxModel
Converts the YDF model into a JAX function.
Usage example:
import ydf
import numpy as np
import jax.numpy as jnp
# Train a model.
model = ydf.GradientBoostedTreesLearner(label="l").train({
"f1": np.random.random(size=100),
"f2": np.random.random(size=100),
"l": np.random.randint(2, size=100),
})
# Convert model to a JAX function.
jax_model = model.to_jax_function()
# Make predictions with the JAX function.
jax_predictions = jax_model.predict({
"f1": jnp.array([0, 0.5, 1]),
"f2": jnp.array([1, 0, 0.5]),
})
TODO: Document the encoder and jax params.
Parameters:
Name  Type  Description  Default 

jit

bool

If true, compiles the function with @jax.jit. 
True

apply_activation

bool

Should the activation function, if any, be applied on the model output. 
True

leaves_as_params

bool

If true, exports the leaf values as learnable
parameters. In this case, 
False

compatibility

Union[str, Compatibility]

Constraint on the YDF>JAX conversion to runtime compatibility. Can be "XLA" (default), and "TFL" (for TensorFlow Lite). 
'XLA'

Returns:
Type  Description 

JaxModel

A dataclass containing the JAX prediction function ( 
JaxModel

optionnaly the model parameteres ( 
JaxModel

( 
to_tensorflow_function
to_tensorflow_function(temp_dir: Optional[str] = None, can_be_saved: bool = True, squeeze_binary_classification: bool = True, force: bool = False) > Module
Converts the YDF model into a @tf.function callable TensorFlow Module.
The output module can be composed with other TensorFlow operations,
including other models serialized with to_tensorflow_function
.
This function requires TensorFlow and TensorFlow Decision Forests to be
installed. You can install them using the command pip install
tensorflow_decision_forests
. The generated SavedModel model relies on the
TensorFlow Decision Forests Custom Inference Op. This Op is available by
default in various platforms such as Servomatic, TensorFlow Serving, Vertex
AI, and TensorFlow.js.
Usage example:
!pip install tensorflow_decision_forests
import ydf
import numpy as np
import tensorflow as tf
# Train a model.
model = ydf.RandomForestLearner(label="l").train({
"f1": np.random.random(size=100),
"f2": np.random.random(size=100),
"l": np.random.randint(2, size=100),
})
# Convert model to a TF module.
tf_model = model.to_tensorflow_function()
# Make predictions with the TF module.
tf_predictions = tf_model({
"f1": tf.constant([0, 0.5, 1]),
"f2": tf.constant([1, 0, 0.5]),
})
Parameters:
Name  Type  Description  Default 

temp_dir

Optional[str]

Temporary directory used during the conversion. If None
(default), uses 
None

can_be_saved

bool

If can_be_saved = True (default), the returned module can be
saved using 
True

squeeze_binary_classification

bool

If true (default), in case of binary classification, outputs a tensor of shape [num examples] containing the probability of the positive class. If false, in case of binary classification, outputs a tensorflow of shape [num examples, 2] containing the probability of both the negative and positive classes. Has no effect on nonbinary classification models. 
True

force

bool

Try to export even in currently unsupported environments. 
False

Returns:
Type  Description 

Module

A TensorFlow @tf.function. 
to_tensorflow_saved_model
to_tensorflow_saved_model(path: str, input_model_signature_fn: Any = None, *, mode: Literal['keras', 'tf'] = 'keras', feature_dtypes: Dict[str, TFDType] = {}, servo_api: bool = False, feed_example_proto: bool = False, pre_processing: Optional[Callable] = None, post_processing: Optional[Callable] = None, temp_dir: Optional[str] = None, tensor_specs: Optional[Dict[str, Any]] = None, feature_specs: Optional[Dict[str, Any]] = None, force: bool = False) > None
Exports the model as a TensorFlow Saved model.
This function requires TensorFlow and TensorFlow Decision Forests to be
installed. Install them by running the command pip install
tensorflow_decision_forests
. The generated SavedModel model relies on the
TensorFlow Decision Forests Custom Inference Op. This Op is available by
default in various platforms such as Servomatic, TensorFlow Serving, Vertex
AI, and TensorFlow.js.
Usage example:
!pip install tensorflow_decision_forests
import ydf
import numpy as np
import tensorflow as tf
# Train a model.
model = ydf.RandomForestLearner(label="l").train({
"f1": np.random.random(size=100),
"f2": np.random.random(size=100).astype(dtype=np.float32),
"l": np.random.randint(2, size=100),
})
# Export the model to the TensorFlow SavedModel format.
# The model can be executed with Servomatic, TensorFlow Serving and
# Vertex AI.
model.to_tensorflow_saved_model(path="/tmp/my_model", mode="tf")
# The model can also be loaded in TensorFlow and executed locally.
# Load the TensorFlow Saved model.
tf_model = tf.saved_model.load("/tmp/my_model")
# Make predictions
tf_predictions = tf_model({
"f1": tf.constant(np.random.random(size=10)),
"f2": tf.constant(np.random.random(size=10), dtype=tf.float32),
})
TensorFlow SavedModel do not cast automatically feature values. For
instance, a model trained with a dtype=float32 semantic=numerical feature,
will require for this feature to be fed as float32 numbers during inference.
You can override the dtype of a feature with the feature_dtypes
argument:
model.to_tensorflow_saved_model(
path="/tmp/my_model",
mode="tf",
# "f1" is fed as an tf.int64 instead of tf.float64
feature_dtypes={"f1": tf.int64},
)
Some TensorFlow Serving or Servomatic pipelines rely on feed examples as
serialized TensorFlow Example proto (instead of raw tensor values) and/or
wrap the model raw output (e.g. probability predictions) into a special
structure (called the Serving API). You can create models compatible with
those two convensions with feed_example_proto=True
and servo_api=True
respectively:
model.to_tensorflow_saved_model(
path="/tmp/my_model",
mode="tf",
feed_example_proto=True,
servo_api=True
)
If your model requires some data preprocessing or postprocessing, you can
express them as a @tf.function or a tf module and pass them to the
pre_processing
and post_processing
arguments respectively.
Warning: When exporting a SavedModel, YDF infers the model signature using
the dtype of the features observed during training. If the signature of the
pre_processing function is different than the signature of the model (e.g.,
the processing creates a new feature), you need to specify the tensor specs
(tensor_specs
; if feed_example_proto=False
) or feature spec
(feature_specs
; if feed_example_proto=True
) argument:
# Define a preprocessing function
@tf.function
def pre_processing(raw_features):
features = {**raw_features}
# Create a new feature.
features["sin_f1"] = tf.sin(features["f1"])
# Remove a feature
del features["f1"]
return features
# Create Numpy dataset
raw_dataset = {
"f1": np.random.random(size=100),
"f2": np.random.random(size=100),
"l": np.random.randint(2, size=100),
}
# Apply the preprocessing on the training dataset.
processed_dataset = (
tf.data.Dataset.from_tensor_slices(raw_dataset)
.batch(128) # The batch size has no impact on the model.
.map(preprocessing)
.prefetch(tf.data.AUTOTUNE)
)
# Train a model on the preprocessed dataset.
ydf_model = specialized_learners.RandomForestLearner(
label="l",
task=generic_learner.Task.CLASSIFICATION,
).train(processed_dataset)
# Export the model to a raw SavedModel model with the preprocessing
model.to_tensorflow_saved_model(
path="/tmp/my_model",
mode="tf",
feed_example_proto=False,
pre_processing=pre_processing,
tensor_specs{
"f1": tf.TensorSpec(shape=[None], name="f1", dtype=tf.float64),
"f2": tf.TensorSpec(shape=[None], name="f2", dtype=tf.float64),
}
)
# Export the model to a SavedModel consuming serialized tf examples with the
# preprocessing
model.to_tensorflow_saved_model(
path="/tmp/my_model",
mode="tf",
feed_example_proto=True,
pre_processing=pre_processing,
feature_specs={
"f1": tf.io.FixedLenFeature(
shape=[], dtype=tf.float32, default_value=math.nan
),
"f2": tf.io.FixedLenFeature(
shape=[], dtype=tf.float32, default_value=math.nan
),
}
)
For more flexibility, use the method to_tensorflow_function
instead of
to_tensorflow_saved_model
.
Parameters:
Name  Type  Description  Default 

path

str

Path to store the Tensorflow Decision Forests model. 
required 
input_model_signature_fn

Any

A lambda that returns the
(Dense,Sparse,Ragged)TensorSpec (or structure of TensorSpec e.g.
dictionary, list) corresponding to input signature of the model. If not
specified, the input model signature is created by

None

mode

Literal['keras', 'tf']

How is the YDF converted into a TensorFlow SavedModel. 1) mode =
"keras" (default): Turn the model into a Keras 2 model using TensorFlow
Decision Forests, and then save it with 
'keras'

feature_dtypes

Dict[str, TFDType]

Mapping from feature name to TensorFlow dtype. Use this
mapping to feature dtype. For instance, numerical features are encoded
with tf.float32 by default. If you plan on feeding tf.float64 or
tf.int32, use 
{}

servo_api

bool

If true, adds a SavedModel signature to make the model
compatible with the 
False

feed_example_proto

bool

If false, the model expects for the input features to be provided as TensorFlow values. This is most efficient way to make predictions. If true, the model expects for the input featurs to be provided as a binary serialized TensorFlow Example proto. This is the format expected by VertexAI and most TensorFlow Serving pipelines. 
False

pre_processing

Optional[Callable]

Optional TensorFlow function or module to apply on the
input features before applying the model. If the 
None

post_processing

Optional[Callable]

Optional TensorFlow function or module to apply on the model predictions. Only compatible with mode="tf". 
None

temp_dir

Optional[str]

Temporary directory used during the conversion. If None
(default), uses 
None

tensor_specs

Optional[Dict[str, Any]]

Optional dictionary of 
None

feature_specs

Optional[Dict[str, Any]]

Optional dictionary of 
None

force

bool

Try to export even in currently unsupported environments. WARNING: Setting this to true may crash the Python runtime. 
False

update_with_jax_params
Updates the model with JAX params as created by to_jax_function
.
Usage example:
import ydf
import numpy as np
import jax.numpy as jnp
# Train a model with YDF
dataset = {
"f1": np.random.random(size=100),
"f2": np.random.random(size=100),
"l": np.random.randint(2, size=100),
}
model = ydf.GradientBoostedTreesLearner(label="l").train(dataset)
# Convert model to a JAX function with leave values as parameters.
jax_model = model.to_jax_function(
leaves_as_params=True,
apply_activation=True)
# Note: The learnable model parameter are in `jax_model.params`.
# Finetune the model parameters with your own logic.
jax_model.params = fine_tune_model(jax_model.params, ...)
# Update the YDF model with the finetuned parameters
model.update_with_jax_params(jax_model.params)
# Make predictions with the finetuned YDF model
predictions = model.predict(dataset)
# Save the YDF model
model.save("/tmp/my_ydf_model")
Parameters:
Name  Type  Description  Default 

params

Dict[str, Any]

Learnable parameter of the model generated with 
required 
variable_importances
Variable importances to measure the impact of features on the model.
Variable importances generally indicates how much a variable (feature) contributes to the model predictions or quality. Different Variable importances have different semantics and are generally not comparable.
The variable importances returned by variable_importances()
depends on the
learning algorithm and its hyperparameters. For example, the hyperparameter
compute_oob_variable_importances=True
of the Random Forest learner enables
the computation of permutation outofbag variable importances.
TODO: Add variable importances to documentation.
Features are sorted by decreasing importance.
Usage example:
# Train a Random Forest. Enable the computation of OOB (outofbag) variable
# importances.
model = ydf.RandomForestModel(compute_oob_variable_importances=True,
label=...).train(ds)
# List the available variable importances.
print(model.variable_importances().keys())
# Show a specific variable importance.
model.variable_importances()["MEAN_DECREASE_IN_ACCURACY"]
>> [("bill_length_mm", 0.0713061951754389),
("island", 0.007298519736842035),
("flipper_length_mm", 0.004505893640351366),
...
Returns:
Type  Description 

Dict[str, List[Tuple[float, str]]]

Variable importances. 