Utilities¶

Utilities

verbose ¶

verbose(level: Union[int, bool] = 2) -> int

Sets the verbose level of YDF.

The verbose levels are

0 or False: Print no logs. 1 or True: Print a few logs in a colab or notebook cell. Print all the logs in the console. This is the default verbose level. 2: Prints all the logs on all surfaces.

Usage example:

import ydf

save_verbose = ydf.verbose(0)  # Hide all logs
learner = ydf.RandomForestLearner(label="label")
model = learner.train(pd.DataFrame({"feature": [0, 1], "label": [0, 1]}))
ydf.verbose(save_verbose)  # Restore verbose level

Parameters:

Name	Type	Description	Default
`level`	`Union[int, bool]`	New verbose level.	`2`

Returns:

Type	Description
`int`	The previous verbose level.

load_model ¶

load_model(
    directory: str,
    advanced_options: ModelIOOptions = ModelIOOptions(),
) -> ModelType

Load a YDF model from disk.

Usage example:

import pandas as pd
import ydf

# Create a model
dataset = pd.DataFrame({"feature": [0, 1], "label": [0, 1]})
learner = ydf.RandomForestLearner(label="label")
model = learner.train(dataset)

# Save model
model.save("/tmp/my_model")

# Load model
loaded_model = ydf.load_model("/tmp/my_model")

# Make predictions
model.predict(dataset)
loaded_model.predict(dataset)

If a directory contains multiple YDF models, the models are uniquely identified by their prefix. The prefix to use can be specified in the advanced options. If the directory only contains a single model, the correct prefix is detected automatically.

Parameters:

Name	Type	Description	Default
`directory`	`str`	Directory containing the model.	required
`advanced_options`	`ModelIOOptions`	Advanced options for model loading.	`ModelIOOptions()`

Returns:

Type	Description
`ModelType`	Model to use for inference, evaluation or inspection

deserialize_model ¶

deserialize_model(data: bytes) -> ModelType

Loads a serialized YDF model.

Usage example:

import pandas as pd
import ydf

# Create a model
dataset = pd.DataFrame({"feature": [0, 1], "label": [0, 1]})
learner = ydf.RandomForestLearner(label="label")
model = learner.train(dataset)

# Serialize model
# Note: serialized_model is a bytes.
serialized_model = model.serialize()

# Deserialize model
deserialized_model = ydf.deserialize_model(serialized_model)

# Make predictions
model.predict(dataset)
deserialized_model.predict(dataset)

Parameters:

Name	Type	Description	Default
`data`	`bytes`	Serialized model.	required

Returns:

Type	Description
`ModelType`	Model to use for inference, evaluation or inspection

Feature `dataclass` ¶

Feature(
    name: str,
    semantic: Optional[Semantic] = None,
    max_vocab_count: Optional[int] = None,
    min_vocab_frequency: Optional[int] = None,
    num_discretized_numerical_bins: Optional[int] = None,
    monotonic: MonotonicConstraint = None,
    is_already_integerized: Optional[bool] = None,
    vocabulary: Optional[list[str]] = None,
    vocabulary_must_be_complete: bool = False,
)

Bases: object

Semantic and parameters for a single column.

This class allows to

Limit the input features of the model.
Manually specify the semantic of a feature.
Specify feature specific hyper-parameters.

Attributes:

Name	Type	Description
`name`	`str`	The name of the column or feature.
`semantic`	`Optional[Semantic]`	Semantic of the column. If None, the semantic is automatically determined. The semantic controls how a column is interpreted by a model. Using the wrong semantic (e.g. numerical instead of categorical) will hurt your model's quality.
`max_vocab_count`	`Optional[int]`	For CATEGORICAL and CATEGORICAL_SET columns only. Number of unique categorical values stored as string. If more categorical values are present, the least frequent values are grouped into an Out-of-vocabulary item. Reducing the value can improve or hurt the model. If max_vocab_count = -1, the number of values in the column is not limited.
`min_vocab_frequency`	`Optional[int]`	For CATEGORICAL and CATEGORICAL_SET columns only. Minimum number of occurrence of a categorical value. Values present less than "min_vocab_frequency" times in the training dataset are treated as "Out-of-vocabulary".
`num_discretized_numerical_bins`	`Optional[int]`	For DISCRETIZED_NUMERICAL columns only. Number of bins used to discretize DISCRETIZED_NUMERICAL columns. Defaults to 255 bins, i.e. 254 boundaries.
`monotonic`	`MonotonicConstraint`	Monotonic constraints between the feature and the model output. Use `None` (default; or 0) for an unconstrained feature. Use `Monotonic.INCREASING` (or +1) to ensure the model is monotonically increasing with the features. Use `Monotonic.DECREASING` (or -1) to ensure the model is monotonically decreasing with the features.
`is_already_integerized`	`Optional[bool]`	(CATEGORICAL columns only, advanced) If True, the column's categorical values are already provided as integers. See `Semantic.CATEGORICAL`'s "Integerized Behavior" for details. - Integers must be >= -1. - `-1`: Represents a missing value. - `0`: Represents the out-of-vocabulary (OOV) value. - `1` to `N`: Represent the different categories. These values should be dense, meaning they should occupy the range [1, N] without large gaps, where N is the number of unique categories. - This mode is more efficient but requires pre-integerized data. - Warning: This option is NOT suitable for sparse integer IDs like user IDs or product IDs, as they would create an unnecessarily large and sparse feature space. Use the default string-based categorical handling for such cases or remove the feature if it's unlikely to be discriminative. - Warning: Tensorflow Decision Forests uses a different semantic for integerized categorical features.
`vocabulary`	`Optional[list[str]]`	(CATEGORICAL columns only, advanced) If set, defines the vocabulary of the column. The values are assigned indices starting from 1 in the order they appear in this list. Values not in this list are considered out-of-vocabulary (index 0). If set, `min_vocab_frequency` and `max_vocab_count` are ignored. Incompatible with `is_already_integerized=True`. For the label column, use the `label_classes` argument of the learner instead. Note: This parameter is not supported for CATEGORICAL_SET columns.
`vocabulary_must_be_complete`	`bool`	If true, the vocabulary must contain all the values present in the data. If a value is missing, the dataspec generation will fail.

is_already_integerized `class-attribute` `instance-attribute` ¶

is_already_integerized: Optional[bool] = None

max_vocab_count `class-attribute` `instance-attribute` ¶

max_vocab_count: Optional[int] = None

min_vocab_frequency `class-attribute` `instance-attribute` ¶

min_vocab_frequency: Optional[int] = None

monotonic `class-attribute` `instance-attribute` ¶

monotonic: MonotonicConstraint = None

name `instance-attribute` ¶

name: str

normalized_monotonic `property` ¶

normalized_monotonic: Optional[Monotonic]

Returns the normalized version of the "monotonic" attribute.

num_discretized_numerical_bins `class-attribute` `instance-attribute` ¶

num_discretized_numerical_bins: Optional[int] = None

semantic `class-attribute` `instance-attribute` ¶

semantic: Optional[Semantic] = None

vocabulary `class-attribute` `instance-attribute` ¶

vocabulary: Optional[list[str]] = None

vocabulary_must_be_complete `class-attribute` `instance-attribute` ¶

vocabulary_must_be_complete: bool = False

from_column_def `classmethod` ¶

from_column_def(column_def: ColumnDef)

Converts a ColumnDef to a Column.

to_proto_column_guide ¶

to_proto_column_guide() -> ColumnGuide

Creates a proto ColumnGuide from the given specification.

Column `dataclass` ¶

Column(
    name: str,
    semantic: Optional[Semantic] = None,
    max_vocab_count: Optional[int] = None,
    min_vocab_frequency: Optional[int] = None,
    num_discretized_numerical_bins: Optional[int] = None,
    monotonic: MonotonicConstraint = None,
    is_already_integerized: Optional[bool] = None,
    vocabulary: Optional[list[str]] = None,
    vocabulary_must_be_complete: bool = False,
)

Bases: object

Semantic and parameters for a single column.

This class allows to

Limit the input features of the model.
Manually specify the semantic of a feature.
Specify feature specific hyper-parameters.

Attributes:

Name	Type	Description
`name`	`str`	The name of the column or feature.
`semantic`	`Optional[Semantic]`	Semantic of the column. If None, the semantic is automatically determined. The semantic controls how a column is interpreted by a model. Using the wrong semantic (e.g. numerical instead of categorical) will hurt your model's quality.
`max_vocab_count`	`Optional[int]`	For CATEGORICAL and CATEGORICAL_SET columns only. Number of unique categorical values stored as string. If more categorical values are present, the least frequent values are grouped into an Out-of-vocabulary item. Reducing the value can improve or hurt the model. If max_vocab_count = -1, the number of values in the column is not limited.
`min_vocab_frequency`	`Optional[int]`	For CATEGORICAL and CATEGORICAL_SET columns only. Minimum number of occurrence of a categorical value. Values present less than "min_vocab_frequency" times in the training dataset are treated as "Out-of-vocabulary".
`num_discretized_numerical_bins`	`Optional[int]`	For DISCRETIZED_NUMERICAL columns only. Number of bins used to discretize DISCRETIZED_NUMERICAL columns. Defaults to 255 bins, i.e. 254 boundaries.
`monotonic`	`MonotonicConstraint`	Monotonic constraints between the feature and the model output. Use `None` (default; or 0) for an unconstrained feature. Use `Monotonic.INCREASING` (or +1) to ensure the model is monotonically increasing with the features. Use `Monotonic.DECREASING` (or -1) to ensure the model is monotonically decreasing with the features.
`is_already_integerized`	`Optional[bool]`	(CATEGORICAL columns only, advanced) If True, the column's categorical values are already provided as integers. See `Semantic.CATEGORICAL`'s "Integerized Behavior" for details. - Integers must be >= -1. - `-1`: Represents a missing value. - `0`: Represents the out-of-vocabulary (OOV) value. - `1` to `N`: Represent the different categories. These values should be dense, meaning they should occupy the range [1, N] without large gaps, where N is the number of unique categories. - This mode is more efficient but requires pre-integerized data. - Warning: This option is NOT suitable for sparse integer IDs like user IDs or product IDs, as they would create an unnecessarily large and sparse feature space. Use the default string-based categorical handling for such cases or remove the feature if it's unlikely to be discriminative. - Warning: Tensorflow Decision Forests uses a different semantic for integerized categorical features.
`vocabulary`	`Optional[list[str]]`	(CATEGORICAL columns only, advanced) If set, defines the vocabulary of the column. The values are assigned indices starting from 1 in the order they appear in this list. Values not in this list are considered out-of-vocabulary (index 0). If set, `min_vocab_frequency` and `max_vocab_count` are ignored. Incompatible with `is_already_integerized=True`. For the label column, use the `label_classes` argument of the learner instead. Note: This parameter is not supported for CATEGORICAL_SET columns.
`vocabulary_must_be_complete`	`bool`	If true, the vocabulary must contain all the values present in the data. If a value is missing, the dataspec generation will fail.

is_already_integerized `class-attribute` `instance-attribute` ¶

is_already_integerized: Optional[bool] = None

max_vocab_count `class-attribute` `instance-attribute` ¶

max_vocab_count: Optional[int] = None

min_vocab_frequency `class-attribute` `instance-attribute` ¶

min_vocab_frequency: Optional[int] = None

monotonic `class-attribute` `instance-attribute` ¶

monotonic: MonotonicConstraint = None

name `instance-attribute` ¶

name: str

normalized_monotonic `property` ¶

normalized_monotonic: Optional[Monotonic]

Returns the normalized version of the "monotonic" attribute.

num_discretized_numerical_bins `class-attribute` `instance-attribute` ¶

num_discretized_numerical_bins: Optional[int] = None

semantic `class-attribute` `instance-attribute` ¶

semantic: Optional[Semantic] = None

vocabulary `class-attribute` `instance-attribute` ¶

vocabulary: Optional[list[str]] = None

vocabulary_must_be_complete `class-attribute` `instance-attribute` ¶

vocabulary_must_be_complete: bool = False

from_column_def `classmethod` ¶

from_column_def(column_def: ColumnDef)

Converts a ColumnDef to a Column.

to_proto_column_guide ¶

to_proto_column_guide() -> ColumnGuide

Creates a proto ColumnGuide from the given specification.

Task ¶

Bases: Enum

A task that a model is trained to solve.

Not all tasks are compatible with all learners or hyperparameters. For more information, see the tutorials on individual tasks in the documentation.

Usage example:

import ydf

learner = ydf.RandomForestLearner(
    label="income", task=ydf.Task.CLASSIFICATION
)
# model = learner.train(...)
# assert model.task() == ydf.Task.CLASSIFICATION

Attributes:

Name	Type	Description
`CLASSIFICATION`		Predicts a categorical label.
`REGRESSION`		Predicts a numerical label.
`RANKING`		Ranks a set of items. The label represents the relevance of an item. For example, with the default NDCG metric, the label is a numerical value where 0 indicates a completely unrelated item and 4 indicates a perfect match.
`CATEGORICAL_UPLIFT`		Predicts the incremental impact of a treatment on a categorical outcome.
`NUMERICAL_UPLIFT`		Predicts the incremental impact of a treatment on a numerical outcome.
`ANOMALY_DETECTION`		Detects if an instance is an outlier compared to the training data. The prediction is a score between 0 and 1, where 0 represents a normal instance and 1 represents the most anomalous instance.
`SURVIVAL_ANALYSIS`		Predicts the survival probability of an individual over time.

ANOMALY_DETECTION `class-attribute` `instance-attribute` ¶

ANOMALY_DETECTION = 'ANOMALY_DETECTION'

CATEGORICAL_UPLIFT `class-attribute` `instance-attribute` ¶

CATEGORICAL_UPLIFT = 'CATEGORICAL_UPLIFT'

CLASSIFICATION `class-attribute` `instance-attribute` ¶

CLASSIFICATION = 'CLASSIFICATION'

NUMERICAL_UPLIFT `class-attribute` `instance-attribute` ¶

NUMERICAL_UPLIFT = 'NUMERICAL_UPLIFT'

RANKING `class-attribute` `instance-attribute` ¶

RANKING = 'RANKING'

REGRESSION `class-attribute` `instance-attribute` ¶

REGRESSION = 'REGRESSION'

SURVIVAL_ANALYSIS `class-attribute` `instance-attribute` ¶

SURVIVAL_ANALYSIS = 'SURVIVAL_ANALYSIS'

Semantic ¶

Bases: Enum

Semantic (e.g. numerical, categorical) of a column.

Determines how a column is interpreted by the model. Similar to the "ColumnType" of YDF's DataSpecification.

Attributes:

Name	Type	Description
`NUMERICAL`		Numerical value. Generally for quantities or counts with full ordering. For example, the age of a person, or the number of items in a bag. Can be a float or an integer. Missing values are represented by math.nan.
`CATEGORICAL`		A categorical value, representing a type or class from a finite set of possible values without inherent ordering (e.g., colors {RED, BLUE, GREEN}). Default Behavior: - Input can be strings or integers. - Integers are cast to strings. - Missing values are represented by "" (empty string). - YDF builds a vocabulary of unique values. Rare values might be pruned and grouped into an out-of-vocabulary (OOV) sentinel. - Values not seen during training are treated as OOV. Integerized Behavior (`is_already_integerized=True`, advanced): - Input must be integers. No casting to string occurs. - Integers must be >= -1. - `-1`: Represents a missing value. - `0`: Represents the out-of-vocabulary (OOV) value. - `1` to `N`: Represent the different categories, up to the maximum value seen during training. - Any positive integer larger than the largest value seen during training is also treated as OOV. - This mode is more efficient as it avoids vocabulary building and string operations, but requires pre-integerized data.
`HASH`		The hash of a string value. Used when only the equality between values is important (not the value itself). Currently, only used for groups in ranking problems e.g. the query in a query/document problem. The hashing is computed with Google's farmhash and stored as an uint64.
`CATEGORICAL_SET`		Set of categorical values. Great to represent tokenized texts. Can be a string. Unlike CATEGORICAL, the number of items in a CATEGORICAL_SET can change between examples. The order of values inside a feature values does not matter.
`BOOLEAN`		Boolean value. Can be a float or an integer. Missing values are represented by math.nan.
`DISCRETIZED_NUMERICAL`		Numerical values automatically discretized into bins. Discretized numerical columns are faster to train than (non-discretized) numerical columns. If the number of unique values of these columns is lower than the number of bins, the discretization is lossless from the point of view of the model. If the number of unique values of this columns is greater than the number of bins, the discretization is lossy from the point of view of the model. Lossy discretization can reduce and sometimes increase (due to regularization) the quality of the model.
`NUMERICAL_VECTOR_SEQUENCE`		Each value of a vector-sequence feature is a sequence (i.e., ordered list) of fixed-sized numerical vectors. All the vectors in a vector sequence should have the same size, but each vector-sequence can have a different number of vectors. A vector-sequence is suited, for example, to represent a multi-variate time-series or a list of LLM tokens.

BOOLEAN `class-attribute` `instance-attribute` ¶

BOOLEAN = 5

CATEGORICAL `class-attribute` `instance-attribute` ¶

CATEGORICAL = 2

CATEGORICAL_SET `class-attribute` `instance-attribute` ¶

CATEGORICAL_SET = 4

DISCRETIZED_NUMERICAL `class-attribute` `instance-attribute` ¶

DISCRETIZED_NUMERICAL = 6

HASH `class-attribute` `instance-attribute` ¶

HASH = 3

NUMERICAL `class-attribute` `instance-attribute` ¶

NUMERICAL = 1

NUMERICAL_VECTOR_SEQUENCE `class-attribute` `instance-attribute` ¶

NUMERICAL_VECTOR_SEQUENCE = 7

from_proto_type `classmethod` ¶

from_proto_type(column_type: ColumnType)

to_proto_type ¶

to_proto_type() -> ColumnType

evaluate_predictions ¶

evaluate_predictions(
    predictions: ndarray,
    labels: ndarray,
    task: Task,
    *,
    weights: Optional[ndarray] = None,
    label_classes: Optional[List[str]] = None,
    ranking_groups: Optional[ndarray] = None,
    bootstrapping: Union[bool, int] = False,
    ndcg_truncation: int = 5,
    mrr_truncation: int = 5,
    map_truncation: int = 5,
    random_seed: int = 1234,
    num_threads: Optional[int] = None
) -> Evaluation

Evaluates predictions against labels.

This function allows to evaluate the predictions of any model (possibly non-ydf), against the labels with YDF's evaluation format.

YDF models should be evaluated directly with model.evaluate, which is more efficient and convenient.

For binary classification tasks, predictions should contain the predicted probabilities and should be of shape [n], or [n,2] where n is the number of examples. If predictions have shape [n], they should contain the probability of the "positive" class. In the case [n,2], predicions[:0] and predicions[:1] should, respectively, be the probability of the "negative" and "positive" class. The labels should be a 1D array of shape [n], containing either integers 0 and 1, or strings. If the labels are strings, the label_classes must be provided with the "negative" class first. For integer labels, providing label_classes is optional and only used for display.

For multiclass classification tasks, predictions should contain the predicted probabilities and should be of shape [n,k] where n is the number of examples. predicions[:i] should contain the probability of the i-th class. The labels should be a 1D integer array of shape [n]. The labels should be a 1D integer or string array of shape [n]. The names of the classes is given by label_classes in the same order as the predictions. If the labels are integers, they should be in the range 0, .., num_classes -1. If the labels are strings, label_classes must be provided. For integer labels, providing label_classes is optional and only used for display.

For regression tasks, predictions should contain the predicted values as a 1D float array of shape [n], where n is the number of examples. The labels should also be a 1D float array of shape [n].

For ranking tasks, predictions should contain the predicted values as a 1D float array of shape [n], where n is the number of examples. The labels should also be a 1D float array of shape [n]. The ranking groups should be an integer array of shape [n].

Uplift evaluations and anomaly detection evaluations are not supported.

Usage examples:

from sklearn.linear_model import LogisticRegression
import ydf

X_train, X_test, y_train, y_test = ...  # Load data

model = LogisticRegression()
model.fit(X_train, y_train)
predictions: np.ndarray = model.predict_proba(X_test)
evaluation = ydf.evaluate.evaluate_predictions(
    predictions, y_test, ydf.Task.CLASSIFICATION
)
print(evaluation)
evaluation  # Prints an interactive report in IPython / Colab notebooks.

import numpy as np
import ydf

predictions = np.linspace(0, 1, 100)
labels = np.concatenate([np.ones(50), np.zeros(50)]).astype(float)
evaluation = ydf.evaluate.evaluate_predictions(
    predictions, labels, ydf.Task.REGRESSION
)
print(evaluation)
evaluation  # Prints an interactive report in IPython / Colab notebooks.

Parameters:

Name	Type	Description	Default
`predictions`	`ndarray`	Array of predictions to evaluate. The "task" argument defines the expected shape of the prediction array.	required
`labels`	`ndarray`	Label values. The "task" argument defines the expected shape of the prediction array.	required
`task`	`Task`	Task of the model.	required
`weights`	`Optional[ndarray]`	Weights of the examples as a 1D float array of shape [n]. If not provided, all examples have idential weight.	`None`
`label_classes`	`Optional[List[str]]`	Names of the labels. Only used for classification tasks.	`None`
`ranking_groups`	`Optional[ndarray]`	Ranking groups as a 1D integer array of shape [n]. Only used for ranking tasks.	`None`
`bootstrapping`	`Union[bool, int]`	Controls whether bootstrapping is used to evaluate the confidence intervals and statistical tests (i.e., all the metrics ending with "[B]"). If set to false, bootstrapping is disabled. If set to true, bootstrapping is enabled and 2000 bootstrapping samples are used. If set to an integer, it specifies the number of bootstrapping samples to use. In this case, if the number is less than 100, an error is raised as bootstrapping will not yield useful results.	`False`
`ndcg_truncation`	`int`	Controls at which ranking position the NDCG metric should be truncated. Default to 5. Ignored for non-ranking models.	`5`
`mrr_truncation`	`int`	Controls at which ranking position the MRR metric loss should be truncated. Default to 5. Ignored for non-ranking models.	`5`
`map_truncation`	`int`	Controls at which ranking position the MAP metric loss should be truncated. Default to 5. Ignored for non-ranking models.	`5`
`random_seed`	`int`	Random seed for sampling.	`1234`
`num_threads`	`Optional[int]`	Number of threads used to run the model.	`None`

Returns:

Type	Description
`Evaluation`	Evaluation metrics.

start_worker ¶

start_worker(
    port: int, blocking: bool = True
) -> Optional[Callable[[], None]]

Starts a worker locally on the given port.

The addresses of workers are passed to learners with the workers argument.

Usage example:

# On worker machine #0 at address 192.168.0.1
ydf.start_worker(9000)

# On worker machine #1 at address 192.168.0.2
ydf.start_worker(9000)

# On manager
learner = ydf.DistributedGradientBoostedTreesLearner(
      label = "my_label",
      working_dir = "/shared/working_dir",
      resume_training = True,
      workers = ["192.168.0.1:9000", "192.168.0.2:9000"],
  ).train(dataset)

Example with non-blocking call:

# On worker machine
stop_worker = start_worker(blocking=False)
# Do some work with the worker
stop_worker() # Stops the worker

Parameters:

Name	Type	Description	Default
`port`	`int`	TCP port of the worker.	required
`blocking`	`bool`	If true (default), the function is blocking until the worker is stopped (e.g., error, interruption by the manager). If false, the function is non-blocking and returns a callable that, when called, will stop the worker.	`True`

Returns:

Type	Description
`Optional[Callable[[], None]]`	Callable to stop the worker. Only returned if `blocking=False`.

strict ¶

strict(value: bool = True) -> None

Sets the strict mode.

When strict mode is enabled, more warnings are displayed.

Parameters:

Name	Type	Description	Default
`value`	`bool`	New value for the strict mode.	`True`

ModelIOOptions `dataclass` ¶

ModelIOOptions(file_prefix: Optional[str] = None)

Advanced options for saving and loading YDF models.

Attributes:

Name	Type	Description
`file_prefix`	`Optional[str]`	Optional prefix for model files. Allows multiple models to be stored in the same directory, although this is discouraged. If not specified during loading, the prefix is auto-detected. If not specified during saving, no prefix is used.

file_prefix `class-attribute` `instance-attribute` ¶

file_prefix: Optional[str] = None

create_vertical_dataset ¶

create_vertical_dataset(
    data: InputDataset,
    columns: ColumnDefs = None,
    include_all_columns: bool = False,
    max_vocab_count: int = 2000,
    min_vocab_frequency: int = 5,
    discretize_numerical_columns: bool = False,
    num_discretized_numerical_bins: int = 255,
    max_num_scanned_rows_to_infer_semantic: int = 100000,
    max_num_scanned_rows_to_compute_statistics: int = 100000,
    label_classes: Optional[list[str]] = None,
    data_spec: Optional[DataSpecification] = None,
    required_columns: Optional[Sequence[str]] = None,
    dont_unroll_columns: Optional[Sequence[str]] = None,
    label: Optional[str] = None,
) -> VerticalDataset

Creates a VerticalDataset from various sources of data.

The feature semantics are automatically determined and can be explicitly set with the columns argument. The semantics of a dataset (or model) are available in its data_spec.

Note that the CATEGORICAL_SET semantic is not automatically inferred when reading from file. When reading from CSV files, setting the CATEGORICAL_SET semantic for a feature will have YDF tokenize the feature. When reading from in-memory datasets (e.g. pandas), YDF only accepts lists of lists for CATEGORICAL_SET features.

Usage example:

import pandas as pd
import ydf

df = pd.read_csv("my_dataset.csv")

# Loads all the columns
ds = ydf.create_vertical_dataset(df)

# Only load columns "a" and "b". Ensure "b" is interpreted as a categorical
# feature.
ds = ydf.create_vertical_dataset(df,
  columns=[
    "a",
    ("b", ydf.semantic.categorical),
  ])

Parameters:

Name	Type	Description	Default
`data`	`InputDataset`	Source dataset. Supported formats: VerticalDataset, (typed) path, list of (typed) paths, Pandas DataFrame, Xarray Dataset, TensorFlow Dataset, PyGrain DataLoader and Dataset (experimental, Linux only), dictionary of string to NumPy array or lists. If the data is already a VerticalDataset, it is returned unchanged.	required
`columns`	`ColumnDefs`	If None, all columns are imported. The semantic of the columns is determined automatically. Otherwise, if include_all_columns=False (default) only the column listed in `columns` are imported. If include_all_columns=True, all the columns are imported and only the semantic of the columns NOT in `columns` is determined automatically. If specified, "columns" defines the order of the columns - any non-listed columns are appended in-order after the specified columns (if include_all_columns=True).	`None`
`include_all_columns`	`bool`	See `columns`.	`False`
`max_vocab_count`	`int`	Maximum size of the vocabulary of CATEGORICAL and CATEGORICAL_SET columns stored as strings. If more unique values exist, only the most frequent values are kept, and the remaining values are considered as out-of-vocabulary. If max_vocab_count = -1, the number of values in the column is not limited (not recommended).	`2000`
`min_vocab_frequency`	`int`	Minimum number of occurrence of a value for CATEGORICAL and CATEGORICAL_SET columns. Value observed less than `min_vocab_frequency` are considered as out-of-vocabulary.	`5`
`discretize_numerical_columns`	`bool`	If true, discretize all the numerical columns before training. Discretized numerical columns are faster to train with, but they can have a negative impact on the model quality. Using `discretize_numerical_columns=True` is equivalent as setting the column semantic DISCRETIZED_NUMERICAL in the `column` argument. See the definition of DISCRETIZED_NUMERICAL for more details.	`False`
`num_discretized_numerical_bins`	`int`	Number of bins used when discretizing numerical columns.	`255`
`max_num_scanned_rows_to_infer_semantic`	`int`	Number of rows to scan when inferring the column's semantic if it is not explicitly specified. Only used when reading from file, in-memory datasets are always read in full. Setting this to a lower number will speed up dataset reading, but might result in incorrect column semantics. Set to -1 to scan the entire dataset.	`100000`
`max_num_scanned_rows_to_compute_statistics`	`int`	Number of rows to scan when computing a column's statistics. Only used when reading from file, in-memory datasets are always read in full. A column's statistics include the dictionary for categorical features and the mean / min / max for numerical features. Setting this to a lower number will speed up dataset reading, but skew statistics in the dataspec, which can hurt model quality (e.g. if an important category of a categorical feature is considered OOV). Set to -1 to scan the entire dataset.	`100000`
`label_classes`	`Optional[list[str]]`	An ordered list of possible values for the label. This argument is optional and typically not required. If not provided, the label classes are determined automatically from the dataset. If provided, it forces a specific order for the label classes. All label values present in the dataset must be included in this list.	`None`
`data_spec`	`Optional[DataSpecification]`	Dataspec to be used for this dataset. If a data spec is given, all other arguments except `data` and `required_columns` should not be provided.	`None`
`required_columns`	`Optional[Sequence[str]]`	List of columns required in the data. If None, all columns mentioned in the data spec or `columns` are required.	`None`
`dont_unroll_columns`	`Optional[Sequence[str]]`	List of columns that cannot be unrolled. If one such column needs to be unrolled, raise an error.	`None`
`label`	`Optional[str]`	Name of the label column, if any.	`None`

Returns:

Type	Description
`VerticalDataset`	Dataset to be ingested by the learner algorithms.

Raises:

Type	Description
`ValueError`	If the dataset has an unsupported type.

ModelMetadata `dataclass` ¶

ModelMetadata(
    owner: Optional[str] = None,
    created_date: Optional[int] = None,
    uid: Optional[int] = None,
    framework: Optional[str] = None,
    custom_fields: Dict[str, Union[bytes, str]] = (
        lambda: {}
    )(),
)

Metadata information stored in the model.

Attributes:

Name	Type	Description
`owner`	`Optional[str]`	Owner of the model, defaults to empty string for the open-source build of YDF.
`created_date`	`Optional[int]`	Unix timestamp of the model training (in seconds).
`uid`	`Optional[int]`	Unique identifier of the model.
`framework`	`Optional[str]`	Framework used to create the model. Defaults to "Python YDF" for models trained with the Python API.
`custom_fields`	`Dict[str, Union[bytes, str]]`	Custom fields to be populated by the user.

created_date `class-attribute` `instance-attribute` ¶

created_date: Optional[int] = None

custom_fields `class-attribute` `instance-attribute` ¶

custom_fields: Dict[str, Union[bytes, str]] = field(
    default_factory=lambda: {}
)

framework `class-attribute` `instance-attribute` ¶

framework: Optional[str] = None

owner `class-attribute` `instance-attribute` ¶

owner: Optional[str] = None

uid `class-attribute` `instance-attribute` ¶

uid: Optional[int] = None

from_tensorflow_decision_forests ¶

from_tensorflow_decision_forests(
    directory: str,
) -> ModelType

Load a TensorFlow Decision Forests model from disk.

Usage example:

import pandas as pd
import ydf

# Import TF-DF model
loaded_model = ydf.from_tensorflow_decision_forests("/tmp/my_tfdf_model")

# Make predictions
dataset = pd.read_csv("my_dataset.csv")
loaded_model.predict(dataset)

# Show details about the model
loaded_model.describe()

The imported model creates the same predictions as the original TF-DF model.

Only TensorFlow Decision Forests models containing a single Decision Forest and nothing else are supported. That is, combined neural network / decision forest models cannot be imported. Unfortunately, importing such models may succeed but result in incorrect predictions, so check for prediction equality after importing.

Parameters:

Name	Type	Description	Default
`directory`	`str`	Directory containing the TF-DF model.	required

Returns:

Type	Description
`ModelType`	Model to use for inference, evaluation or inspection

from_sklearn ¶

from_sklearn(
    sklearn_model: Any,
    label_name: str = "label",
    feature_name: str = "features",
) -> GenericModel

Converts a tree-based scikit-learn model to a YDF model.

Currently supported models

sklearn.tree.DecisionTreeClassifier
sklearn.tree.DecisionTreeRegressor
sklearn.tree.ExtraTreeClassifier
sklearn.tree.ExtraTreeRegressor
sklearn.ensemble.RandomForestClassifier
sklearn.ensemble.RandomForestRegressor
sklearn.ensemble.ExtraTreesClassifier
sklearn.ensemble.ExtraTreesRegressor
sklearn.ensemble.GradientBoostingRegressor
sklearn.ensemble.IsolationForest

Scikit-learn models do not have named features, so the input features are combined into a single multi-dimensional feature. You can specify its name with the feature_name argument.

Usage example:

import ydf
from sklearn import datasets
from sklearn import tree
import numpy as np

# Train a scikit-learn model
X, y = datasets.make_classification(n_features=4, n_classes=2)
skl_model = tree.DecisionTreeClassifier().fit(X, y)

# Convert the model to YDF
ydf_model = ydf.from_sklearn(skl_model)

# Make predictions with the YDF model
# The input must be a dictionary with the specified feature name.
ydf_predictions = ydf_model.predict({"features": X})

# Analyze the YDF model
# analysis_ds = {"features": X, "label": y}
# ydf_model.analyze(analysis_ds)

Parameters:

Name	Type	Description	Default
`sklearn_model`	`Any`	The scikit-learn tree-based model to convert.	required
`label_name`	`str`	The name to assign to the label column in the YDF model.	`'label'`
`feature_name`	`str`	The name to assign to the multi-dimensional feature column in the YDF model.	`'features'`

Returns:

Type	Description
`GenericModel`	A YDF model that emulates the provided scikit-learn model.

NodeFormat ¶

Bases: Enum

Specifies the storage format for the internal nodes of a tree-based model.

Attributes:

Name	Type	Description
`BLOB_SEQUENCE`		Default format for the public version of YDF.
`BLOB_SEQUENCE_GZIP`		Efficient compressed version of the BLOB_SEQUENCE format. Might not be compatible with pre-2025 builds of YDF and TF-DF.

BLOB_SEQUENCE `class-attribute` `instance-attribute` ¶

BLOB_SEQUENCE = auto()

BLOB_SEQUENCE_GZIP `class-attribute` `instance-attribute` ¶

BLOB_SEQUENCE_GZIP = auto()

DataSpecification `module-attribute` ¶

DataSpecification = DataSpecification

TrainingConfig `module-attribute` ¶

TrainingConfig = TrainingConfig

RegressionLoss `dataclass` ¶

RegressionLoss(
    activation: Activation,
    initial_predictions: Callable[
        [NDArray[float32], NDArray[float32]], float32
    ],
    loss: Callable[
        [
            NDArray[float32],
            NDArray[float32],
            NDArray[float32],
        ],
        float32,
    ],
    gradient_and_hessian: Callable[
        [NDArray[float32], NDArray[float32]],
        Tuple[NDArray[float32], NDArray[float32]],
    ],
    may_trigger_gc: bool = True,
)

Bases: AbstractCustomLoss

A user-provided loss function for regression problems.

Loss functions may never reference their arguments outside after returning: Bad:

mylabels = None
def initial_predictions(labels, weights):
  nonlocal mylabels
  mylabels = labels  # labels is now referenced outside the function

Good:

mylabels = None
def initial_predictions(labels, weights):
  nonlocal mylabels
  mylabels = np.copy(labels)  # mylabels is a copy, not a reference.

Attributes:

Name	Type	Description
`initial_predictions`	`Callable[[NDArray[float32], NDArray[float32]], float32]`	The bias / initial predictions of the GBT model. Receives the label values and the weights, outputs the initial prediction as a float.
`loss`	`Callable[[NDArray[float32], NDArray[float32], NDArray[float32]], float32]`	The loss function controls the early stopping. The loss function receives the labels, the current predictions and the current weights and must output the loss as a float. Note that the predictions provided to the loss functions have not yet had an activation function applied to them.
`gradient_and_hessian`	`Callable[[NDArray[float32], NDArray[float32]], Tuple[NDArray[float32], NDArray[float32]]]`	Gradient and hessian of the current predictions. Note that only the diagonal of the hessian must be provided. Receives as input the labels and the current predictions (without activation) and returns a tuple of the gradient and the hessian.
`activation`	`Activation`	Activation function to be applied to the model. Regression models are expected to return a value in the same space as the labels after applying the activation function.
`may_trigger_gc`	`bool`	If True (default), YDF may trigger Python's garbage collection to determine if a Numpy array that is backed by YDF-internal data is used after its lifetime has ended. If False, checks for illegal memory accesses are disabled. This can be useful when training many small models or if the observed impact of triggering GC is large. If `may_trigger_gc=False`, it is very important that the user validate manually that no memory leakage occurs.

activation `instance-attribute` ¶

activation: Activation

gradient_and_hessian `instance-attribute` ¶

gradient_and_hessian: Callable[
    [NDArray[float32], NDArray[float32]],
    Tuple[NDArray[float32], NDArray[float32]],
]

initial_predictions `instance-attribute` ¶

initial_predictions: Callable[
    [NDArray[float32], NDArray[float32]], float32
]

loss `instance-attribute` ¶

loss: Callable[
    [NDArray[float32], NDArray[float32], NDArray[float32]],
    float32,
]

may_trigger_gc `class-attribute` `instance-attribute` ¶

may_trigger_gc: bool = True

check_is_compatible_task ¶

check_is_compatible_task(task: Task) -> None

Raises an error if the given task is incompatible with this loss type.

BinaryClassificationLoss `dataclass` ¶

BinaryClassificationLoss(
    activation: Activation,
    initial_predictions: Callable[
        [NDArray[int32], NDArray[float32]], float32
    ],
    loss: Callable[
        [
            NDArray[int32],
            NDArray[float32],
            NDArray[float32],
        ],
        float32,
    ],
    gradient_and_hessian: Callable[
        [NDArray[int32], NDArray[float32]],
        Tuple[NDArray[float32], NDArray[float32]],
    ],
    may_trigger_gc: bool = True,
)

Bases: AbstractCustomLoss

A user-provided loss function for binary classification problems.

Note that the labels are binary but 1-based, i.e. the positive class is 2, the negative class is 1.

Loss functions may never reference their arguments outside after returning: Bad:

mylabels = None
def initial_predictions(labels, weights):
  nonlocal mylabels
  mylabels = labels  # labels is now referenced outside the function

Good:

mylabels = None
def initial_predictions(labels, weights):
  nonlocal mylabels
  mylabels = np.copy(labels)  # mylabels is a copy, not a reference.

Attributes:

Name	Type	Description
`initial_predictions`	`Callable[[NDArray[int32], NDArray[float32]], float32]`	The bias / initial predictions of the GBT model. Receives the label values and the weights, outputs the initial prediction as a float.
`loss`	`Callable[[NDArray[int32], NDArray[float32], NDArray[float32]], float32]`	The loss function controls the early stopping. The loss function receives the labels, the current predictions and the current weights and must output the loss as a float. Note that the predictions provided to the loss functions have not yet had an activation function applied to them.
`gradient_and_hessian`	`Callable[[NDArray[int32], NDArray[float32]], Tuple[NDArray[float32], NDArray[float32]]]`	Gradient and hessian of the current predictions. Note that only the diagonal of the hessian must be provided. Receives as input the labels and the current predictions (without activation). Returns a tuple of the gradient and the hessian.
`activation`	`Activation`	Activation function to be applied to the model. Binary classification models are expected to return a probability after applying the activation function.
`may_trigger_gc`	`bool`	If True (default), YDF may trigger Python's garbage collection to determine if an Numpy array that is backed by YDF-internal data is used after its lifetime has ended. If False, checks for illegal memory accesses are disabled. Setting this parameter to False is dangerous, since illegal memory accesses will no longer be detected.

activation `instance-attribute` ¶

activation: Activation

gradient_and_hessian `instance-attribute` ¶

gradient_and_hessian: Callable[
    [NDArray[int32], NDArray[float32]],
    Tuple[NDArray[float32], NDArray[float32]],
]

initial_predictions `instance-attribute` ¶

initial_predictions: Callable[
    [NDArray[int32], NDArray[float32]], float32
]

loss `instance-attribute` ¶

loss: Callable[
    [NDArray[int32], NDArray[float32], NDArray[float32]],
    float32,
]

may_trigger_gc `class-attribute` `instance-attribute` ¶

may_trigger_gc: bool = True

check_is_compatible_task ¶

check_is_compatible_task(task: Task) -> None

Raises an error if the given task is incompatible with this loss type.

MultiClassificationLoss `dataclass` ¶

MultiClassificationLoss(
    activation: Activation,
    initial_predictions: Callable[
        [NDArray[int32], NDArray[float32]], NDArray[float32]
    ],
    loss: Callable[
        [
            NDArray[int32],
            NDArray[float32],
            NDArray[float32],
        ],
        float32,
    ],
    gradient_and_hessian: Callable[
        [NDArray[int32], NDArray[float32]],
        Tuple[NDArray[float32], NDArray[float32]],
    ],
    may_trigger_gc: bool = True,
)

Bases: AbstractCustomLoss

A user-provided loss function for multi-class problems.

Note that the labels are 1-based. Predictions are given in an 2D array with one row per example. Initial predictions, gradient and hessian are expected for each class, e.g. for a 3-class classification problem, output 3 gradients and hessians per class.

Loss functions may never reference their arguments outside after returning: Bad:

mylabels = None
def initial_predictions(labels, weights):
  nonlocal mylabels
  mylabels = labels  # labels is now referenced outside the function

Good:

mylabels = None
def initial_predictions(labels, weights):
  nonlocal mylabels
  mylabels = np.copy(labels)  # mylabels is a copy, not a reference.

Attributes:

Name	Type	Description
`initial_predictions`	`Callable[[NDArray[int32], NDArray[float32]], NDArray[float32]]`	The bias / initial predictions of the GBT model. Receives the label values and the weights, outputs the initial prediction as an array of floats (one initial prediction per class).
`loss`	`Callable[[NDArray[int32], NDArray[float32], NDArray[float32]], float32]`	The loss function controls the early stopping. The loss function receives the labels, the current predictions and the current weights and must output the loss as a float. Note that the predictions provided to the loss functions have not yet had an activation function applied to them.
`gradient_and_hessian`	`Callable[[NDArray[int32], NDArray[float32]], Tuple[NDArray[float32], NDArray[float32]]]`	Gradient and hessian of the current predictions with respect to each class. Note that only the diagonal of the hessian must be provided. Receives as input the labels and the current predictions (without activation). Returns a tuple of the gradient and the hessian. Both gradient and hessian must be arrays of shape (num_classes, num_examples).
`activation`	`Activation`	Activation function to be applied to the model. Multi-class classification models are expected to return a probability distribution over the classes after applying the activation function.
`may_trigger_gc`	`bool`	If True (default), YDF may trigger Python's garbage collection to determine if an Numpy array that is backed by YDF-internal data is used after its lifetime has ended. If False, checks for illegal memory accesses are disabled. Setting this parameter to False is dangerous, since illegal memory accesses will no longer be detected.

activation `instance-attribute` ¶

activation: Activation

gradient_and_hessian `instance-attribute` ¶

gradient_and_hessian: Callable[
    [NDArray[int32], NDArray[float32]],
    Tuple[NDArray[float32], NDArray[float32]],
]

initial_predictions `instance-attribute` ¶

initial_predictions: Callable[
    [NDArray[int32], NDArray[float32]], NDArray[float32]
]

loss `instance-attribute` ¶

loss: Callable[
    [NDArray[int32], NDArray[float32], NDArray[float32]],
    float32,
]

may_trigger_gc `class-attribute` `instance-attribute` ¶

may_trigger_gc: bool = True

check_is_compatible_task ¶

check_is_compatible_task(task: Task) -> None

Raises an error if the given task is incompatible with this loss type.

Activation ¶

Bases: Enum

Activation functions for custom losses.

Not all activation functions are supported for all custom losses. Activation function IDENTITY (i.e., no activation function applied) is always supported.

IDENTITY `class-attribute` `instance-attribute` ¶

IDENTITY = 'IDENTITY'

SIGMOID `class-attribute` `instance-attribute` ¶

SIGMOID = 'SIGMOID'

SOFTMAX `class-attribute` `instance-attribute` ¶

SOFTMAX = 'SOFTMAX'

Utilities¶

verbose ¶

load_model ¶

deserialize_model ¶

Feature dataclass ¶

is_already_integerized class-attribute instance-attribute ¶

max_vocab_count class-attribute instance-attribute ¶

min_vocab_frequency class-attribute instance-attribute ¶

monotonic class-attribute instance-attribute ¶

name instance-attribute ¶

normalized_monotonic property ¶

num_discretized_numerical_bins class-attribute instance-attribute ¶

semantic class-attribute instance-attribute ¶

vocabulary class-attribute instance-attribute ¶

vocabulary_must_be_complete class-attribute instance-attribute ¶

from_column_def classmethod ¶

to_proto_column_guide ¶

Column dataclass ¶

is_already_integerized class-attribute instance-attribute ¶

max_vocab_count class-attribute instance-attribute ¶

min_vocab_frequency class-attribute instance-attribute ¶

monotonic class-attribute instance-attribute ¶

name instance-attribute ¶

normalized_monotonic property ¶

num_discretized_numerical_bins class-attribute instance-attribute ¶

semantic class-attribute instance-attribute ¶

vocabulary class-attribute instance-attribute ¶

vocabulary_must_be_complete class-attribute instance-attribute ¶

from_column_def classmethod ¶

to_proto_column_guide ¶

Task ¶

ANOMALY_DETECTION class-attribute instance-attribute ¶

CATEGORICAL_UPLIFT class-attribute instance-attribute ¶

CLASSIFICATION class-attribute instance-attribute ¶

NUMERICAL_UPLIFT class-attribute instance-attribute ¶

RANKING class-attribute instance-attribute ¶

REGRESSION class-attribute instance-attribute ¶

SURVIVAL_ANALYSIS class-attribute instance-attribute ¶

Semantic ¶

BOOLEAN class-attribute instance-attribute ¶

CATEGORICAL class-attribute instance-attribute ¶

CATEGORICAL_SET class-attribute instance-attribute ¶

DISCRETIZED_NUMERICAL class-attribute instance-attribute ¶

HASH class-attribute instance-attribute ¶

NUMERICAL class-attribute instance-attribute ¶

NUMERICAL_VECTOR_SEQUENCE class-attribute instance-attribute ¶

from_proto_type classmethod ¶

to_proto_type ¶

evaluate_predictions ¶

start_worker ¶

strict ¶

ModelIOOptions dataclass ¶

file_prefix class-attribute instance-attribute ¶

create_vertical_dataset ¶

ModelMetadata dataclass ¶

created_date class-attribute instance-attribute ¶

custom_fields class-attribute instance-attribute ¶

framework class-attribute instance-attribute ¶

owner class-attribute instance-attribute ¶

uid class-attribute instance-attribute ¶

from_tensorflow_decision_forests ¶

from_sklearn ¶

NodeFormat ¶

BLOB_SEQUENCE class-attribute instance-attribute ¶

BLOB_SEQUENCE_GZIP class-attribute instance-attribute ¶

DataSpecification module-attribute ¶

TrainingConfig module-attribute ¶

RegressionLoss dataclass ¶

activation instance-attribute ¶

gradient_and_hessian instance-attribute ¶

initial_predictions instance-attribute ¶

loss instance-attribute ¶

may_trigger_gc class-attribute instance-attribute ¶

check_is_compatible_task ¶

BinaryClassificationLoss dataclass ¶

activation instance-attribute ¶

gradient_and_hessian instance-attribute ¶

initial_predictions instance-attribute ¶

loss instance-attribute ¶

may_trigger_gc class-attribute instance-attribute ¶

Feature `dataclass` ¶

is_already_integerized `class-attribute` `instance-attribute` ¶

max_vocab_count `class-attribute` `instance-attribute` ¶

min_vocab_frequency `class-attribute` `instance-attribute` ¶

monotonic `class-attribute` `instance-attribute` ¶

name `instance-attribute` ¶

normalized_monotonic `property` ¶

num_discretized_numerical_bins `class-attribute` `instance-attribute` ¶

semantic `class-attribute` `instance-attribute` ¶

vocabulary `class-attribute` `instance-attribute` ¶

vocabulary_must_be_complete `class-attribute` `instance-attribute` ¶

from_column_def `classmethod` ¶

Column `dataclass` ¶

is_already_integerized `class-attribute` `instance-attribute` ¶

max_vocab_count `class-attribute` `instance-attribute` ¶

min_vocab_frequency `class-attribute` `instance-attribute` ¶

monotonic `class-attribute` `instance-attribute` ¶

name `instance-attribute` ¶

normalized_monotonic `property` ¶

num_discretized_numerical_bins `class-attribute` `instance-attribute` ¶

semantic `class-attribute` `instance-attribute` ¶

vocabulary `class-attribute` `instance-attribute` ¶

vocabulary_must_be_complete `class-attribute` `instance-attribute` ¶

from_column_def `classmethod` ¶

ANOMALY_DETECTION `class-attribute` `instance-attribute` ¶

CATEGORICAL_UPLIFT `class-attribute` `instance-attribute` ¶

CLASSIFICATION `class-attribute` `instance-attribute` ¶

NUMERICAL_UPLIFT `class-attribute` `instance-attribute` ¶

RANKING `class-attribute` `instance-attribute` ¶

REGRESSION `class-attribute` `instance-attribute` ¶

SURVIVAL_ANALYSIS `class-attribute` `instance-attribute` ¶

BOOLEAN `class-attribute` `instance-attribute` ¶

CATEGORICAL `class-attribute` `instance-attribute` ¶

CATEGORICAL_SET `class-attribute` `instance-attribute` ¶

DISCRETIZED_NUMERICAL `class-attribute` `instance-attribute` ¶

HASH `class-attribute` `instance-attribute` ¶

NUMERICAL `class-attribute` `instance-attribute` ¶

NUMERICAL_VECTOR_SEQUENCE `class-attribute` `instance-attribute` ¶

from_proto_type `classmethod` ¶

ModelIOOptions `dataclass` ¶

file_prefix `class-attribute` `instance-attribute` ¶

ModelMetadata `dataclass` ¶

created_date `class-attribute` `instance-attribute` ¶

custom_fields `class-attribute` `instance-attribute` ¶

framework `class-attribute` `instance-attribute` ¶

owner `class-attribute` `instance-attribute` ¶

uid `class-attribute` `instance-attribute` ¶

BLOB_SEQUENCE `class-attribute` `instance-attribute` ¶

BLOB_SEQUENCE_GZIP `class-attribute` `instance-attribute` ¶

DataSpecification `module-attribute` ¶

TrainingConfig `module-attribute` ¶

RegressionLoss `dataclass` ¶

activation `instance-attribute` ¶

gradient_and_hessian `instance-attribute` ¶

initial_predictions `instance-attribute` ¶

loss `instance-attribute` ¶

may_trigger_gc `class-attribute` `instance-attribute` ¶

BinaryClassificationLoss `dataclass` ¶

activation `instance-attribute` ¶

gradient_and_hessian `instance-attribute` ¶

initial_predictions `instance-attribute` ¶

loss `instance-attribute` ¶

may_trigger_gc `class-attribute` `instance-attribute` ¶

MultiClassificationLoss `dataclass` ¶

activation `instance-attribute` ¶

gradient_and_hessian `instance-attribute` ¶

initial_predictions `instance-attribute` ¶

loss `instance-attribute` ¶

may_trigger_gc `class-attribute` `instance-attribute` ¶

IDENTITY `class-attribute` `instance-attribute` ¶

SIGMOID `class-attribute` `instance-attribute` ¶

SOFTMAX `class-attribute` `instance-attribute` ¶