Skip to content

GenericLearner

GenericLearner

GenericLearner(learner_name: str, task: Task, label: str, weights: Optional[str], ranking_group: Optional[str], uplift_treatment: Optional[str], data_spec_args: DataSpecInferenceArgs, data_spec: Optional[DataSpecification], hyper_parameters: HyperParameters, deployment_config: DeploymentConfig, tuner: Optional[AbstractTuner])

A generic YDF learner.

cross_validation

cross_validation(ds: InputDataset, folds: int = 10, bootstrapping: Union[bool, int] = False, parallel_evaluations: int = 1) -> Evaluation

Cross-validates the learner and return the evaluation.

Usage example:

import pandas as pd
import ydf

dataset = pd.read_csv("my_dataset.csv")
learner = ydf.RandomForestLearner(label="label")
evaluation = learner.cross_validation(dataset)

# In a notebook, display an interractive evaluation
evaluation

# Print the evaluation
print(evaluation)

# Look at specific metrics
print(evaluation.accuracy)

Parameters:

Name Type Description Default
ds InputDataset

Dataset for the cross-validation.

required
folds int

Number of cross-validation folds.

10
bootstrapping Union[bool, int]

Controls whether bootstrapping is used to evaluate the confidence intervals and statistical tests (i.e., all the metrics ending with "[B]"). If set to false, bootstrapping is disabled. If set to true, bootstrapping is enabled and 2000 bootstrapping samples are used. If set to an integer, it specifies the number of bootstrapping samples to use. In this case, if the number is less than 100, an error is raised as bootstrapping will not yield useful results.

False
parallel_evaluations int

Number of model to train and evaluate in parallel using multi-threading. Note that each model is potentially already trained with multithreading (see num_threads argument of Learner constructor).

1

Returns:

Type Description
Evaluation

The cross-validation evaluation.

train

train(ds: InputDataset, valid: Optional[InputDataset] = None) -> ModelType

Trains a model on the given dataset.

Options for dataset reading are given on the learner. Consult the documentation of the learner or ydf.create_vertical_dataset() for additional information on dataset reading in YDF.

Usage example:

import ydf
import pandas as pd

train_ds = pd.read_csv(...)
test_ds = pd.read_csv(...)

learner = ydf.GradientBoostedTreesLearner(label="label")
model = learner.train(train_ds)
evaluation = model.evaluate(test_ds)

Usage example with a validation dataset:

import ydf
import pandas as pd

train_ds = pd.read_csv(...)
valid_ds = pd.read_csv(...)
test_ds = pd.read_csv(...)

learner = ydf.GradientBoostedTreesLearner(label="label")
model = learner.train(train_ds, valid=valid_ds)
evaluation = model.evaluate(test_ds)

If training is interrupted (for example, by interrupting the cell execution in Colab), the model will be returned to the state it was in at the moment of interruption.

Parameters:

Name Type Description Default
ds InputDataset

Training dataset.

required
valid Optional[InputDataset]

Optional validation dataset. Some learners, such as Random Forest, do not need validation dataset. Some learners, such as GradientBoostedTrees, automatically extract a validation dataset from the training dataset if the validation dataset is not provided.

None

Returns:

Type Description
ModelType

A trained model.