API Reference¶
This page documents the Python API for YDF. Users can also train models using the C++ and CLI APIs.
Learners¶
A Learner trains models and can be cross-validated.
- GradientBoostedTreesLearner
- RandomForestLearner
- CartLearner
- DecisionTreeLearner: Alias to CartLearner.
- DistributedGradientBoostedTreesLearner
- IsolationForestLearner
All learners derive from GenericLearner.
Models¶
A Model makes predictions and can be evaluated.
Note: Models (e.g., GradientBoostedTreesModel
) do not contain training
capabilities. To train a model, you need to create a learner (e.g.,
GradientBoostedTreesLearner
). Training hyperparameters are constructor
arguments of learner classes.
- GradientBoostedTreesModel
- RandomForestModel
- CARTModel: Alias to RandomForestModel.
- IsolationForestModel
All models derive from GenericModel.
Tuners¶
A Tuner finds the optimal set of hyper-parameters using repeated training and evaluation.
- RandomSearchTuner
- VizierTuner (currently, for Googlers only)
- OptimizerLogs
Feature Selector¶
A Feature Selector finds the optimal set of input features for the model.
Other¶
- load_model: Load a model from disk.
- Feature: Input feature specific hyper-parameters e.g. semantic, constraints.
- Column: Alias for
Feature
. - Task: Specify the task solved by the model e.g. classification.
- Semantic: How an input feature in interpreted e.g. numerical, categorical.
- evaluate_predictions: Evaluates predictions of YDF and non-YDF models.
- verbose: Control the amount of logging.
- start_worker: Start a worker for distributed training.
- strict: Show more logs.
Utilities¶
- ydf.util.read_tf_record: Read a TF Record dataset in memory.
- ydf.util.write_tf_record: Write a TF Record dataset from memory.
Advanced Utilities¶
- ModelIOOptions: Options to save a model to disk.
- create_vertical_dataset: Load a dataset in memory.
- ModelMetadata: Meta-data about the model e.g. training date, uid.
- from_tensorflow_decision_forests: Load a TensorFlow Decision Forests model from disk.
- from_sklearn: Convert a scikit-learn model into a YDF model.
- NodeFormat: Format used to serialize the tree nodes.
- DataSpecification: Internal data specification proto. Describe the columns of the model.
- TrainingConfig: Internal training configuration proto. Describe the training configuration of a learner.
Custom Loss¶
- RegressionLoss: Custom loss for regression tasks.
- BinaryClassificationLoss: Custom loss for binary classification tasks.
- MultiClassificationLoss: Custom loss for multi-class classification tasks.
- Activation: Collection of activation (aka linkage) functions for custom losses.
Tree¶
The ydf.tree.*
classes provides programmatic read and write access to the tree
structure, leaves, and values.
- tree.Tree: A decision tree as returned and consumed
by
model.get_tree(...)
andmodel.set_tree(...)
..
Conditions¶
- tree.AbstractCondition: Base condition class.
- tree.NumericalHigherThanCondition:
Condition of the form
attribute >= threshold
. - tree.CategoricalIsInCondition:
Condition of the form
attribute in mask
. - tree.CategoricalSetContainsCondition:
Condition of the form
attribute intersect mask != empty
. - tree.DiscretizedNumericalHigherThanCondition:
Condition of the form
attribute >= bounds[threshold]
. - tree.IsMissingInCondition:
Condition of the form
attribute is missing
. - tree.IsTrueCondition: Condition of the
form
attribute is true
. - tree.NumericalSparseObliqueCondition:
Condition of the form
sum(attributes[i] * weights[i]) >= threshold
.
Nodes¶
- tree.AbstractNode: Base node class.
- tree.Leaf: A leaf node containing a value.
- tree.NonLeaf: A non-leaf node containing a condition.
Values¶
- tree.AbstractValue: Base value class.
- tree.ProbabilityValue: A probability distribution value.
- tree.Leaf: The regression value of a regressive tree.
- tree.UpliftValue: The uplift value of a classification or regression uplift tree.