Skip to content

IsolationForestModel

IsolationForestModel

IsolationForestModel(raw_model: GenericCCModel)

Bases: DecisionForestModel

An Isolation Forest model for prediction and inspection.

add_tree

add_tree(tree: Tree) -> None

Adds a single tree of the model.

Parameters:

Name Type Description Default
tree Tree

New tree.

required

analyze

analyze(
    data: InputDataset,
    sampling: float = 1.0,
    num_bins: int = 50,
    partial_dependence_plot: bool = True,
    conditional_expectation_plot: bool = True,
    permutation_variable_importance_rounds: int = 1,
    num_threads: Optional[int] = None,
    maximum_duration: Optional[float] = 20,
) -> Analysis

analyze_prediction

analyze_prediction(
    single_example: InputDataset,
) -> PredictionAnalysis

benchmark

benchmark(
    ds: InputDataset,
    benchmark_duration: float = 3,
    warmup_duration: float = 1,
    batch_size: int = 100,
    num_threads: Optional[int] = None,
) -> BenchmarkInferenceCCResult

data_spec

data_spec() -> DataSpecification

describe

describe(
    output_format: Literal[
        "auto", "text", "notebook", "html"
    ] = "auto",
    full_details: bool = False,
) -> Union[str, HtmlNotebookDisplay]

distance

distance(
    data1: InputDataset,
    data2: Optional[InputDataset] = None,
) -> ndarray

Computes the pairwise distance between examples in "data1" and "data2".

If "data2" is not provided, computes the pairwise distance between examples in "data1".

Usage example:

import pandas as pd
import ydf

# Train model
train_ds = pd.read_csv("train.csv")
model = ydf.RandomForestLearner(label="label").Train(train_ds)

test_ds = pd.read_csv("test.csv")
distances = model.distance(test_ds, train_ds)
# "distances[i,j]" is the distance between the i-th test example and the
# j-th train example.

Different models are free to implement different distances with different definitions. For this reasons, unless indicated by the model, distances from different models cannot be compared.

The distance is not guaranteed to satisfy the triangular inequality property of metric distances.

Not all models can compute distances. In this case, this function will raise an Exception.

Parameters:

Name Type Description Default
data1 InputDataset

Dataset. Can be a dictionary of list or numpy array of values, Pandas DataFrame, or a VerticalDataset.

required
data2 Optional[InputDataset]

Dataset. Can be a dictionary of list or numpy array of values, Pandas DataFrame, or a VerticalDataset.

None

Returns:

Type Description
ndarray

Pairwise distance

evaluate

evaluate(
    data: InputDataset,
    *,
    weighted: Optional[bool] = None,
    task: Optional[Task] = None,
    label: Optional[str] = None,
    group: Optional[str] = None,
    bootstrapping: Union[bool, int] = False,
    ndcg_truncation: int = 5,
    mrr_truncation: int = 5,
    evaluation_task: Optional[Task] = None,
    use_slow_engine: bool = False,
    num_threads: Optional[int] = None
) -> Evaluation

feature_selection_logs

feature_selection_logs() -> Optional[FeatureSelectorLogs]

force_engine

force_engine(engine_name: Optional[str]) -> None

get_all_trees

get_all_trees() -> Sequence[Tree]

Returns all the trees in the model.

get_tree

get_tree(tree_idx: int) -> Tree

Gets a single tree of the model.

Parameters:

Name Type Description Default
tree_idx int

Index of the tree. Should be in [0, num_trees()).

required

Returns:

Type Description
Tree

The tree.

hyperparameter_optimizer_logs

hyperparameter_optimizer_logs() -> Optional[OptimizerLogs]

input_feature_names

input_feature_names() -> List[str]

Returns the names of the input features.

The features are sorted in increasing order of column_idx.

input_features

input_features() -> Sequence[InputFeature]

Returns the input features of the model.

The features are sorted in increasing order of column_idx.

input_features_col_idxs

input_features_col_idxs() -> Sequence[int]

iter_trees

iter_trees() -> Iterator[Tree]

Returns an iterator over all the trees in the model.

label

label() -> str

Name of the label column.

label_classes

label_classes() -> List[str]

Returns the label classes for a classification model; fails otherwise.

label_col_idx

label_col_idx() -> int

list_compatible_engines

list_compatible_engines() -> Sequence[str]

metadata

metadata() -> ModelMetadata

name

name() -> str

num_examples_per_tree

num_examples_per_tree() -> int

Returns the number of examples used to grow each tree.

num_trees

num_trees()

Returns the number of trees in the decision forest.

plot_tree

plot_tree(
    tree_idx: int = 0,
    max_depth: Optional[int] = None,
    options: Optional[PlotOptions] = None,
    d3js_url: str = "https://d3js.org/d3.v6.min.js",
) -> TreePlot

Plots an interactive HTML rendering of the tree.

Usage example:

# Create a dataset
train_ds = pd.DataFrame({
    "c1": [1.0, 1.1, 2.0, 3.5, 4.2] + list(range(10)),
    "label": ["a", "b", "b", "a", "a"] * 3,
})
# Train a CART model
model = ydf.CartLearner(label="label").train(train_ds)
# Make sure the model is a CART
assert isinstance(model, ydf.CARTModel)
# Plot the tree in Colab
model.plot_tree()

Parameters:

Name Type Description Default
tree_idx int

Index of the tree. Should be in [0, self.num_trees()).

0
max_depth Optional[int]

Maximum tree depth of the plot. Set to None for full depth.

None
options Optional[PlotOptions]

Advanced options for plotting. Set to None for default style.

None
d3js_url str

URL to load the d3.js library from.

'https://d3js.org/d3.v6.min.js'

Returns:

Type Description
TreePlot

In interactive environments, an interactive plot. The HTML source can also

TreePlot

be exported to file.

predict

predict(
    data: InputDataset,
    *,
    use_slow_engine: bool = False,
    num_threads: Optional[int] = None
) -> ndarray

predict_class

predict_class(
    data: InputDataset,
    *,
    use_slow_engine: bool = False,
    num_threads: Optional[int] = None
) -> ndarray

Returns the most likely predicted class for a classification model.

Usage example:

import pandas as pd
import ydf

# Train model
train_ds = pd.read_csv("train.csv")
model = ydf.RandomForestLearner(label="label").train(train_ds)

test_ds = pd.read_csv("test.csv")
predictions = model.predict_class(test_ds)

This method returns a numpy array of string of shape [num_examples]. Each value represents the most likely class for the corresponding example. This method can only be used for classification models.

In case of ties, the first class inmodel.label_classes() is returned.

See model.predict to generate the full prediction probabilities.

Parameters:

Name Type Description Default
data InputDataset

Dataset. Supported formats: VerticalDataset, (typed) path, list of (typed) paths, Pandas DataFrame, Xarray Dataset, TensorFlow Dataset, PyGrain DataLoader and Dataset (experimental, Linux only), dictionary of string to NumPy array or lists. If the dataset contains the label column, that column is ignored.

required
use_slow_engine bool

If true, uses the slow engine for making predictions. The slow engine of YDF is an order of magnitude slower than the other prediction engines. There exist very rare edge cases where predictions with the regular engines fail, e.g., models with a very large number of categorical conditions. It is only in these cases that users should use the slow engine and report the issue to the YDF developers.

False
num_threads Optional[int]

Number of threads used to run the model.

None

Returns:

Type Description
ndarray

The most likely predicted class for each example.

predict_leaves

predict_leaves(data: InputDataset) -> ndarray

Gets the index of the active leaf in each tree.

The active leaf is the leave that that receive the example during inference.

The returned value "leaves[i,j]" is the index of the active leaf for the i-th example and the j-th tree. Leaves are indexed by depth first exploration with the negative child visited before the positive one.

Parameters:

Name Type Description Default
data InputDataset

Dataset.

required

Returns:

Type Description
ndarray

Index of the active leaf for each tree in the model.

print_tree

print_tree(
    tree_idx: int = 0,
    max_depth: Optional[int] = 6,
    file: Any = stdout,
) -> None

Prints a tree in the terminal.

Usage example:

# Create a dataset
train_ds = pd.DataFrame({
    "c1": [1.0, 1.1, 2.0, 3.5, 4.2] + list(range(10)),
    "label": ["a", "b", "b", "a", "a"] * 3,
})
# Train a CART model
model = ydf.CartLearner(label="label").train(train_ds)
# Make sure the model is a CART
assert isinstance(model, ydf.CARTModel)
# Print the tree
model.print_tree()

Parameters:

Name Type Description Default
tree_idx int

Index of the tree. Should be in [0, self.num_trees()).

0
max_depth Optional[int]

Maximum tree depth of the plot. Set to None for full depth.

6
file Any

Where to print the tree. By default, prints on the terminal standard output.

stdout

remove_tree

remove_tree(tree_idx: int) -> None

Removes a single tree of the model.

Parameters:

Name Type Description Default
tree_idx int

Index of the tree. Should be in [0, num_trees()).

required

save

save(
    path: str,
    advanced_options=ModelIOOptions(),
    *,
    pure_serving=False
) -> None

self_evaluation

self_evaluation() -> Evaluation

Returns the model's self-evaluation.

Different models use different methods for self-evaluation. Notably, Random Forests use OOB evaluation and Gradient Boosted Trees use evaluation on the validation dataset. Therefore, self-evaluations are not comparable between different model types.

Usage example:

import pandas as pd
import ydf

# Train model
train_ds = pd.read_csv("train.csv")
model = ydf.GradientBoostedTreesLearner(label="label").train(train_ds)

self_evaluation = model.self_evaluation()
# In an interactive Python environment, print a rich evaluation report.
self_evaluation

serialize

serialize() -> bytes

set_data_spec

set_data_spec(data_spec: DataSpecification) -> None

set_feature_selection_logs

set_feature_selection_logs(
    value: Optional[FeatureSelectorLogs],
) -> None

set_metadata

set_metadata(metadata: ModelMetadata)

set_node_format

set_node_format(node_format: NodeFormat) -> None

Set the serialization format for the nodes.

Parameters:

Name Type Description Default
node_format NodeFormat

Node format to use when saving the model.

required

set_tree

set_tree(tree_idx: int, tree: Tree) -> None

Overrides a single tree of the model.

Parameters:

Name Type Description Default
tree_idx int

Index of the tree. Should be in [0, num_trees()).

required
tree Tree

New tree.

required

task

task() -> Task

to_cpp

to_cpp(key: str = 'my_model') -> str

to_docker

to_docker(path: str, exist_ok: bool = False) -> None

Exports the model to a Docker endpoint deployable on Cloud.

This function creates a directory containing a Dockerfile, the model and support files.

Usage example:

import ydf

# Train a model.
model = ydf.RandomForestLearner(label="l").train({
    "f1": np.random.random(size=100),
    "f2": np.random.random(size=100),
    "l": np.random.randint(2, size=100),
})

# Export the model to a Docker endpoint.
model.to_docker(path="/tmp/my_model")

# Print instructions on how to use the model
!cat /tmp/my_model/readme.md

# Test the end-point locally
docker build --platform linux/amd64 -t ydf_predict_image /tmp/my_model
docker run --rm -p 8080:8080 -d ydf_predict_image

# Deploy the model on Google Cloud
gcloud run deploy ydf-predict --source /tmp/my_model

# Check the automatically created utility scripts "test_locally.sh" and
# "deploy_in_google_cloud.sh" for more examples.

Parameters:

Name Type Description Default
path str

Directory where to create the Docker endpoint

required
exist_ok bool

If false (default), fails if the directory already exist. If true, override the directory content if any.

False

to_jax_function

to_jax_function(
    jit: bool = True,
    apply_activation: bool = True,
    leaves_as_params: bool = False,
    compatibility: Union[str, Compatibility] = "XLA",
) -> JaxModel

to_tensorflow_function

to_tensorflow_function(
    temp_dir: Optional[str] = None,
    can_be_saved: bool = True,
    squeeze_binary_classification: bool = True,
    force: bool = False,
) -> Module

to_tensorflow_saved_model

to_tensorflow_saved_model(
    path: str,
    input_model_signature_fn: Any = None,
    *,
    mode: Literal["keras", "tf"] = "keras",
    feature_dtypes: Dict[str, TFDType] = {},
    servo_api: bool = False,
    feed_example_proto: bool = False,
    pre_processing: Optional[Callable] = None,
    post_processing: Optional[Callable] = None,
    temp_dir: Optional[str] = None,
    tensor_specs: Optional[Dict[str, Any]] = None,
    feature_specs: Optional[Dict[str, Any]] = None,
    force: bool = False
) -> None

update_with_jax_params

update_with_jax_params(params: Dict[str, Any])

variable_importances

variable_importances() -> (
    Dict[str, List[Tuple[float, str]]]
)