GenericLearner
GenericLearner ¶
GenericLearner(
learner_name: str,
task: Task,
label: Optional[str],
weights: Optional[str],
ranking_group: Optional[str],
uplift_treatment: Optional[str],
data_spec_args: DataSpecInferenceArgs,
data_spec: Optional[DataSpecification],
hyper_parameters: HyperParameters,
explicit_learner_arguments: Optional[Set[str]],
deployment_config: DeploymentConfig,
tuner: Optional[AbstractTuner],
feature_selector: Optional[AbstractFeatureSelector],
extra_training_config: Optional[TrainingConfig],
)
Bases: ABC
A generic YDF learner.
hyperparameters
property
¶
A (mutable) dictionary of this learner's hyperparameters.
This object can be used to inspect or modify hyperparameters after creating
the learner. Modifying hyperparameters after constructing the learner is
suitable for some advanced use cases. Since this approach bypasses some
feasibility checks for the given set of hyperparameters, it generally better
to re-create the learner for each model. The current set of hyperparameters
can be validated manually with validate_hyperparameters()
.
cross_validation
abstractmethod
¶
cross_validation(
ds: InputDataset,
folds: int = 10,
bootstrapping: Union[bool, int] = False,
parallel_evaluations: int = 1,
) -> Evaluation
Cross-validates the learner and return the evaluation.
Usage example:
import pandas as pd
import ydf
dataset = pd.read_csv("my_dataset.csv")
learner = ydf.RandomForestLearner(label="label")
evaluation = learner.cross_validation(dataset)
# In a notebook, display an interractive evaluation
evaluation
# Print the evaluation
print(evaluation)
# Look at specific metrics
print(evaluation.accuracy)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ds
|
InputDataset
|
Dataset for the cross-validation. |
required |
folds
|
int
|
Number of cross-validation folds. |
10
|
bootstrapping
|
Union[bool, int]
|
Controls whether bootstrapping is used to evaluate the confidence intervals and statistical tests (i.e., all the metrics ending with "[B]"). If set to false, bootstrapping is disabled. If set to true, bootstrapping is enabled and 2000 bootstrapping samples are used. If set to an integer, it specifies the number of bootstrapping samples to use. In this case, if the number is less than 100, an error is raised as bootstrapping will not yield useful results. |
False
|
parallel_evaluations
|
int
|
Number of model to train and evaluate in parallel
using multi-threading. Note that each model is potentially already
trained with multithreading (see |
1
|
Returns:
Type | Description |
---|---|
Evaluation
|
The cross-validation evaluation. |
extract_input_feature_names
abstractmethod
¶
Extracts the input features available in a dataset.
train ¶
train(
ds: InputDataset,
valid: Optional[InputDataset] = None,
verbose: Optional[Union[int, bool]] = None,
) -> ModelType
Trains a model on the given dataset.
Options for dataset reading are given on the learner. Consult the documentation of the learner or ydf.create_vertical_dataset() for additional information on dataset reading in YDF.
Usage example:
import ydf
import pandas as pd
train_ds = pd.read_csv(...)
test_ds = pd.read_csv(...)
learner = ydf.GradientBoostedTreesLearner(label="label")
model = learner.train(train_ds)
evaluation = model.evaluate(test_ds)
Usage example with a validation dataset:
import ydf
import pandas as pd
train_ds = pd.read_csv(...)
valid_ds = pd.read_csv(...)
test_ds = pd.read_csv(...)
learner = ydf.GradientBoostedTreesLearner(label="label")
model = learner.train(train_ds, valid=valid_ds)
evaluation = model.evaluate(test_ds)
If training is interrupted (for example, by interrupting the cell execution in Colab), the model will be returned to the state it was in at the moment of interruption.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ds
|
InputDataset
|
Training dataset. |
required |
valid
|
Optional[InputDataset]
|
Optional validation dataset. Some learners, such as Random Forest, do not need validation dataset. Some learners, such as GradientBoostedTrees, automatically extract a validation dataset from the training dataset if the validation dataset is not provided. |
None
|
verbose
|
Optional[Union[int, bool]]
|
Verbose level during training. If None, uses the global verbose
level of |
None
|
Returns:
Type | Description |
---|---|
ModelType
|
A trained model. |
train_imp
abstractmethod
¶
train_imp(
ds: InputDataset,
valid: Optional[InputDataset],
verbose: Optional[Union[int, bool]],
) -> ModelType
Trains a model.
validate_hyperparameters
abstractmethod
¶
Raises an exception if the hyperparameters are invalid.
This method is called automatically before training, but users may call it to fail early. It makes sense to call this method when changing manually the hyper-paramters of the learner. This is a relatively advanced approach that is not recommende (it is better to re-create the learner in most cases).
Usage example: