API Reference¶
This page documents the Python API for YDF. Users can also train models using the C++ and CLI APIs.
Learners¶
A Learner trains models and can be cross-validated.
- GradientBoostedTreesLearner
- RandomForestLearner
- CartLearner
- DecisionTreeLearner: Alias to CartLearner.
- DistributedGradientBoostedTreesLearner
- IsolationForestLearner
All learners derive from GenericLearner.
Models¶
A Model makes predictions and can be evaluated.
Note: Models (e.g., GradientBoostedTreesModel) do not contain training
capabilities. To train a model, you need to create a learner (e.g.,
GradientBoostedTreesLearner). Training hyperparameters are constructor
arguments of learner classes.
- GradientBoostedTreesModel
- RandomForestModel
- CARTModel: Alias to RandomForestModel.
- IsolationForestModel
All models derive from GenericModel.
Tuners¶
A Tuner finds the optimal set of hyper-parameters using repeated training and evaluation.
- RandomSearchTuner
- VizierTuner (currently, for Googlers only)
- OptimizerLogs
Feature Selector¶
A Feature Selector finds the optimal set of input features for the model.
Other¶
- load_model: Load a model from disk.
- Feature: Input feature specific hyper-parameters e.g. semantic, constraints.
- Column: Alias for
Feature. - Task: Specify the task solved by the model e.g. classification.
- Semantic: How an input feature in interpreted e.g. numerical, categorical.
- evaluate_predictions: Evaluates predictions of YDF and non-YDF models.
- verbose: Control the amount of logging.
- start_worker: Start a worker for distributed training.
- strict: Show more logs.
- Evaluation: Result of a model evaluation i.e.
model.evaluate(...).
Utilities¶
- ydf.util.read_tf_record: Read a TF Record dataset in memory.
- ydf.util.write_tf_record: Write a TF Record dataset from memory.
- ydf.util.get_vertex_ai_cluster_spec: Parses the Vertex AI cluster specification for distributed training.
Advanced Utilities¶
- ModelIOOptions: Options to save a model to disk.
- create_vertical_dataset: Load a dataset in memory.
- ModelMetadata: Meta-data about the model e.g. training date, uid.
- from_tensorflow_decision_forests: Load a TensorFlow Decision Forests model from disk.
- from_sklearn: Convert a scikit-learn model into a YDF model.
- NodeFormat: Format used to serialize the tree nodes.
- DataSpecification: Internal data specification proto. Describe the columns of the model.
- TrainingConfig: Internal training configuration proto. Describe the training configuration of a learner.
Custom Loss¶
- RegressionLoss: Custom loss for regression tasks.
- BinaryClassificationLoss: Custom loss for binary classification tasks.
- MultiClassificationLoss: Custom loss for multi-class classification tasks.
- Activation: Collection of activation (aka linkage) functions for custom losses.
Tree¶
The ydf.tree.* classes provides programmatic read and write access to the tree
structure, leaves, and values.
- tree.Tree: A decision tree as returned and consumed
by
model.get_tree(...)andmodel.set_tree(...)..
Conditions¶
- tree.AbstractCondition: Base condition class.
- tree.NumericalHigherThanCondition:
Condition of the form
attribute >= threshold. - tree.CategoricalIsInCondition:
Condition of the form
attribute in mask. - tree.CategoricalSetContainsCondition:
Condition of the form
attribute intersect mask != empty. - tree.DiscretizedNumericalHigherThanCondition:
Condition of the form
attribute >= bounds[threshold]. - tree.IsMissingInCondition:
Condition of the form
attribute is missing. - tree.IsTrueCondition: Condition of the
form
attribute is true. - tree.NumericalSparseObliqueCondition:
Condition of the form
sum(attributes[i] * weights[i]) >= threshold.
Nodes¶
- tree.AbstractNode: Base node class.
- tree.Leaf: A leaf node containing a value.
- tree.NonLeaf: A non-leaf node containing a condition.
Values¶
- tree.AbstractValue: Base value class.
- tree.ProbabilityValue: A probability distribution value.
- tree.Leaf: The regression value of a regressive tree.
- tree.UpliftValue: The uplift value of a classification or regression uplift tree.