GenericLearner

GenericLearner
- cross_validation
- train

GenericLearner

GenericLearner(learner_name: str, task: Task, label: str, weights: Optional[str], ranking_group: Optional[str], uplift_treatment: Optional[str], data_spec_args: DataSpecInferenceArgs, data_spec: Optional[DataSpecification], hyper_parameters: HyperParameters, deployment_config: DeploymentConfig, tuner: Optional[AbstractTuner])

A generic YDF learner.

cross_validation

cross_validation(ds: InputDataset, folds: int = 10, bootstrapping: Union[bool, int] = False, parallel_evaluations: int = 1) -> Evaluation

Cross-validates the learner and return the evaluation.

Usage example:

import pandas as pd
import ydf

dataset = pd.read_csv("my_dataset.csv")
learner = ydf.RandomForestLearner(label="label")
evaluation = learner.cross_validation(dataset)

# In a notebook, display an interractive evaluation
evaluation

# Print the evaluation
print(evaluation)

# Look at specific metrics
print(evaluation.accuracy)

Parameters:

Name	Type	Description	Default
`ds`	`InputDataset`	Dataset for the cross-validation.	required
`folds`	`int`	Number of cross-validation folds.	`10`
`bootstrapping`	`Union[bool, int]`	Controls whether bootstrapping is used to evaluate the confidence intervals and statistical tests (i.e., all the metrics ending with "[B]"). If set to false, bootstrapping is disabled. If set to true, bootstrapping is enabled and 2000 bootstrapping samples are used. If set to an integer, it specifies the number of bootstrapping samples to use. In this case, if the number is less than 100, an error is raised as bootstrapping will not yield useful results.	`False`
`parallel_evaluations`	`int`	Number of model to train and evaluate in parallel using multi-threading. Note that each model is potentially already trained with multithreading (see `num_threads` argument of Learner constructor).	`1`

Returns:

Type	Description
`Evaluation`	The cross-validation evaluation.

train

train(ds: InputDataset, valid: Optional[InputDataset] = None) -> ModelType

Trains a model on the given dataset.

Options for dataset reading are given on the learner. Consult the documentation of the learner or ydf.create_vertical_dataset() for additional information on dataset reading in YDF.

Usage example:

import ydf
import pandas as pd

train_ds = pd.read_csv(...)
test_ds = pd.read_csv(...)

learner = ydf.GradientBoostedTreesLearner(label="label")
model = learner.train(train_ds)
evaluation = model.evaluate(test_ds)

Usage example with a validation dataset:

import ydf
import pandas as pd

train_ds = pd.read_csv(...)
valid_ds = pd.read_csv(...)
test_ds = pd.read_csv(...)

learner = ydf.GradientBoostedTreesLearner(label="label")
model = learner.train(train_ds, valid=valid_ds)
evaluation = model.evaluate(test_ds)

If training is interrupted (for example, by interrupting the cell execution in Colab), the model will be returned to the state it was in at the moment of interruption.

Parameters:

Name	Type	Description	Default
`ds`	`InputDataset`	Training dataset.	required
`valid`	`Optional[InputDataset]`	Optional validation dataset. Some learners, such as Random Forest, do not need validation dataset. Some learners, such as GradientBoostedTrees, automatically extract a validation dataset from the training dataset if the validation dataset is not provided.	`None`

Returns:

Type	Description
`ModelType`	A trained model.