API Reference¶

This page documents the Python API for YDF. Users can also train models using the C++ and CLI APIs.

Learners¶

A Learner trains models and can be cross-validated.

All learners derive from GenericLearner.

Models¶

A Model makes predictions and can be evaluated.

Note: Models (e.g., GradientBoostedTreesModel) do not contain training capabilities. To train a model, you need to create a learner (e.g., GradientBoostedTreesLearner). Training hyperparameters are constructor arguments of learner classes.

All models derive from GenericModel.

Tuners¶

A Tuner finds the optimal set of hyper-parameters using repeated training and evaluation.

RandomSearchTuner
VizierTuner (currently, for Googlers only)
OptimizerLogs

Feature Selector¶

A Feature Selector finds the optimal set of input features for the model.

Other¶

load_model: Load a model from disk.
Feature: Input feature specific hyper-parameters e.g. semantic, constraints.
Column: Alias for Feature.
Task: Specify the task solved by the model e.g. classification.
Semantic: How an input feature in interpreted e.g. numerical, categorical.
evaluate_predictions: Evaluates predictions of YDF and non-YDF models.
verbose: Control the amount of logging.
start_worker: Start a worker for distributed training.
strict: Show more logs.
Evaluation: Result of a model evaluation i.e. model.evaluate(...).

Utilities¶

ydf.util.read_tf_record: Read a TF Record dataset in memory.
ydf.util.write_tf_record: Write a TF Record dataset from memory.
ydf.util.get_vertex_ai_cluster_spec: Parses the Vertex AI cluster specification for distributed training.

Advanced Utilities¶

ModelIOOptions: Options to save a model to disk.
create_vertical_dataset: Load a dataset in memory.
ModelMetadata: Meta-data about the model e.g. training date, uid.
from_tensorflow_decision_forests: Load a TensorFlow Decision Forests model from disk.
from_sklearn: Convert a scikit-learn model into a YDF model.
NodeFormat: Format used to serialize the tree nodes.
DataSpecification: Internal data specification proto. Describe the columns of the model.
TrainingConfig: Internal training configuration proto. Describe the training configuration of a learner.

Custom Loss¶

RegressionLoss: Custom loss for regression tasks.
BinaryClassificationLoss: Custom loss for binary classification tasks.
MultiClassificationLoss: Custom loss for multi-class classification tasks.
Activation: Collection of activation (aka linkage) functions for custom losses.

Tree¶

The ydf.tree.* classes provides programmatic read and write access to the tree structure, leaves, and values.

tree.Tree: A decision tree as returned and consumed by model.get_tree(...) and model.set_tree(...)..

Conditions¶

tree.AbstractCondition: Base condition class.
tree.NumericalHigherThanCondition: Condition of the form attribute >= threshold.
tree.CategoricalIsInCondition: Condition of the form attribute in mask.
tree.CategoricalSetContainsCondition: Condition of the form attribute intersect mask != empty.
tree.DiscretizedNumericalHigherThanCondition: Condition of the form attribute >= bounds[threshold].
tree.IsMissingInCondition: Condition of the form attribute is missing.
tree.IsTrueCondition: Condition of the form attribute is true.
tree.NumericalSparseObliqueCondition: Condition of the form sum(attributes[i] * weights[i]) >= threshold.

Nodes¶

tree.AbstractNode: Base node class.
tree.Leaf: A leaf node containing a value.
tree.NonLeaf: A non-leaf node containing a condition.

Values¶

tree.AbstractValue: Base value class.
tree.ProbabilityValue: A probability distribution value.
tree.Leaf: The regression value of a regressive tree.
tree.UpliftValue: The uplift value of a classification or regression uplift tree.