IsolationForestModel
- IsolationForestModel
- add_tree
- analyze
- analyze_prediction
- benchmark
- data_spec
- describe
- distance
- evaluate
- feature_selection_logs
- force_engine
- get_all_trees
- get_tree
- hyperparameter_optimizer_logs
- input_feature_names
- input_features
- input_features_col_idxs
- iter_trees
- label
- label_classes
- label_col_idx
- list_compatible_engines
- metadata
- name
- num_examples_per_tree
- num_trees
- plot_tree
- predict
- predict_class
- predict_leaves
- print_tree
- remove_tree
- save
- self_evaluation
- serialize
- set_data_spec
- set_feature_selection_logs
- set_metadata
- set_node_format
- set_tree
- task
- to_cpp
- to_docker
- to_jax_function
- to_tensorflow_function
- to_tensorflow_saved_model
- update_with_jax_params
- variable_importances
IsolationForestModel ¶
Bases: DecisionForestModel
An Isolation Forest model for prediction and inspection.
add_tree ¶
add_tree(tree: Tree) -> None
Adds a single tree of the model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree
|
Tree
|
New tree. |
required |
analyze ¶
analyze(
data: InputDataset,
sampling: float = 1.0,
num_bins: int = 50,
partial_dependence_plot: bool = True,
conditional_expectation_plot: bool = True,
permutation_variable_importance_rounds: int = 1,
num_threads: Optional[int] = None,
maximum_duration: Optional[float] = 20,
) -> Analysis
benchmark ¶
benchmark(
ds: InputDataset,
benchmark_duration: float = 3,
warmup_duration: float = 1,
batch_size: int = 100,
num_threads: Optional[int] = None,
) -> BenchmarkInferenceCCResult
describe ¶
describe(
output_format: Literal[
"auto", "text", "notebook", "html"
] = "auto",
full_details: bool = False,
) -> Union[str, HtmlNotebookDisplay]
distance ¶
distance(
data1: InputDataset,
data2: Optional[InputDataset] = None,
) -> ndarray
Computes the pairwise distance between examples in "data1" and "data2".
If "data2" is not provided, computes the pairwise distance between examples in "data1".
Usage example:
import pandas as pd
import ydf
# Train model
train_ds = pd.read_csv("train.csv")
model = ydf.RandomForestLearner(label="label").Train(train_ds)
test_ds = pd.read_csv("test.csv")
distances = model.distance(test_ds, train_ds)
# "distances[i,j]" is the distance between the i-th test example and the
# j-th train example.
Different models are free to implement different distances with different definitions. For this reasons, unless indicated by the model, distances from different models cannot be compared.
The distance is not guaranteed to satisfy the triangular inequality property of metric distances.
Not all models can compute distances. In this case, this function will raise an Exception.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data1
|
InputDataset
|
Dataset. Can be a dictionary of list or numpy array of values, Pandas DataFrame, or a VerticalDataset. |
required |
data2
|
Optional[InputDataset]
|
Dataset. Can be a dictionary of list or numpy array of values, Pandas DataFrame, or a VerticalDataset. |
None
|
Returns:
Type | Description |
---|---|
ndarray
|
Pairwise distance |
evaluate ¶
evaluate(
data: InputDataset,
*,
weighted: Optional[bool] = None,
task: Optional[Task] = None,
label: Optional[str] = None,
group: Optional[str] = None,
bootstrapping: Union[bool, int] = False,
ndcg_truncation: int = 5,
mrr_truncation: int = 5,
evaluation_task: Optional[Task] = None,
use_slow_engine: bool = False,
num_threads: Optional[int] = None
) -> Evaluation
get_tree ¶
input_feature_names ¶
Returns the names of the input features.
The features are sorted in increasing order of column_idx.
input_features ¶
input_features() -> Sequence[InputFeature]
Returns the input features of the model.
The features are sorted in increasing order of column_idx.
label_classes ¶
Returns the label classes for a classification model; fails otherwise.
num_examples_per_tree ¶
num_examples_per_tree() -> int
Returns the number of examples used to grow each tree.
plot_tree ¶
plot_tree(
tree_idx: int = 0,
max_depth: Optional[int] = None,
options: Optional[PlotOptions] = None,
d3js_url: str = "https://d3js.org/d3.v6.min.js",
) -> TreePlot
Plots an interactive HTML rendering of the tree.
Usage example:
# Create a dataset
train_ds = pd.DataFrame({
"c1": [1.0, 1.1, 2.0, 3.5, 4.2] + list(range(10)),
"label": ["a", "b", "b", "a", "a"] * 3,
})
# Train a CART model
model = ydf.CartLearner(label="label").train(train_ds)
# Make sure the model is a CART
assert isinstance(model, ydf.CARTModel)
# Plot the tree in Colab
model.plot_tree()
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree_idx
|
int
|
Index of the tree. Should be in [0, self.num_trees()). |
0
|
max_depth
|
Optional[int]
|
Maximum tree depth of the plot. Set to None for full depth. |
None
|
options
|
Optional[PlotOptions]
|
Advanced options for plotting. Set to None for default style. |
None
|
d3js_url
|
str
|
URL to load the d3.js library from. |
'https://d3js.org/d3.v6.min.js'
|
Returns:
Type | Description |
---|---|
TreePlot
|
In interactive environments, an interactive plot. The HTML source can also |
TreePlot
|
be exported to file. |
predict ¶
predict(
data: InputDataset,
*,
use_slow_engine: bool = False,
num_threads: Optional[int] = None
) -> ndarray
predict_class ¶
predict_class(
data: InputDataset,
*,
use_slow_engine: bool = False,
num_threads: Optional[int] = None
) -> ndarray
Returns the most likely predicted class for a classification model.
Usage example:
import pandas as pd
import ydf
# Train model
train_ds = pd.read_csv("train.csv")
model = ydf.RandomForestLearner(label="label").train(train_ds)
test_ds = pd.read_csv("test.csv")
predictions = model.predict_class(test_ds)
This method returns a numpy array of string of shape [num_examples]
. Each
value represents the most likely class for the corresponding example. This
method can only be used for classification models.
In case of ties, the first class inmodel.label_classes()
is returned.
See model.predict
to generate the full prediction probabilities.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
InputDataset
|
Dataset. Supported formats: VerticalDataset, (typed) path, list of (typed) paths, Pandas DataFrame, Xarray Dataset, TensorFlow Dataset, PyGrain DataLoader and Dataset (experimental, Linux only), dictionary of string to NumPy array or lists. If the dataset contains the label column, that column is ignored. |
required |
use_slow_engine
|
bool
|
If true, uses the slow engine for making predictions. The slow engine of YDF is an order of magnitude slower than the other prediction engines. There exist very rare edge cases where predictions with the regular engines fail, e.g., models with a very large number of categorical conditions. It is only in these cases that users should use the slow engine and report the issue to the YDF developers. |
False
|
num_threads
|
Optional[int]
|
Number of threads used to run the model. |
None
|
Returns:
Type | Description |
---|---|
ndarray
|
The most likely predicted class for each example. |
predict_leaves ¶
Gets the index of the active leaf in each tree.
The active leaf is the leave that that receive the example during inference.
The returned value "leaves[i,j]" is the index of the active leaf for the i-th example and the j-th tree. Leaves are indexed by depth first exploration with the negative child visited before the positive one.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
InputDataset
|
Dataset. |
required |
Returns:
Type | Description |
---|---|
ndarray
|
Index of the active leaf for each tree in the model. |
print_tree ¶
Prints a tree in the terminal.
Usage example:
# Create a dataset
train_ds = pd.DataFrame({
"c1": [1.0, 1.1, 2.0, 3.5, 4.2] + list(range(10)),
"label": ["a", "b", "b", "a", "a"] * 3,
})
# Train a CART model
model = ydf.CartLearner(label="label").train(train_ds)
# Make sure the model is a CART
assert isinstance(model, ydf.CARTModel)
# Print the tree
model.print_tree()
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree_idx
|
int
|
Index of the tree. Should be in [0, self.num_trees()). |
0
|
max_depth
|
Optional[int]
|
Maximum tree depth of the plot. Set to None for full depth. |
6
|
file
|
Any
|
Where to print the tree. By default, prints on the terminal standard output. |
stdout
|
remove_tree ¶
remove_tree(tree_idx: int) -> None
Removes a single tree of the model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree_idx
|
int
|
Index of the tree. Should be in [0, num_trees()). |
required |
self_evaluation ¶
Returns the model's self-evaluation.
Different models use different methods for self-evaluation. Notably, Random Forests use OOB evaluation and Gradient Boosted Trees use evaluation on the validation dataset. Therefore, self-evaluations are not comparable between different model types.
Usage example:
set_feature_selection_logs ¶
set_feature_selection_logs(
value: Optional[FeatureSelectorLogs],
) -> None
set_node_format ¶
set_node_format(node_format: NodeFormat) -> None
Set the serialization format for the nodes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node_format
|
NodeFormat
|
Node format to use when saving the model. |
required |
set_tree ¶
to_docker ¶
Exports the model to a Docker endpoint deployable on Cloud.
This function creates a directory containing a Dockerfile, the model and support files.
Usage example:
import ydf
# Train a model.
model = ydf.RandomForestLearner(label="l").train({
"f1": np.random.random(size=100),
"f2": np.random.random(size=100),
"l": np.random.randint(2, size=100),
})
# Export the model to a Docker endpoint.
model.to_docker(path="/tmp/my_model")
# Print instructions on how to use the model
!cat /tmp/my_model/readme.md
# Test the end-point locally
docker build --platform linux/amd64 -t ydf_predict_image /tmp/my_model
docker run --rm -p 8080:8080 -d ydf_predict_image
# Deploy the model on Google Cloud
gcloud run deploy ydf-predict --source /tmp/my_model
# Check the automatically created utility scripts "test_locally.sh" and
# "deploy_in_google_cloud.sh" for more examples.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Directory where to create the Docker endpoint |
required |
exist_ok
|
bool
|
If false (default), fails if the directory already exist. If true, override the directory content if any. |
False
|
to_jax_function ¶
to_jax_function(
jit: bool = True,
apply_activation: bool = True,
leaves_as_params: bool = False,
compatibility: Union[str, Compatibility] = "XLA",
) -> JaxModel
to_tensorflow_function ¶
to_tensorflow_function(
temp_dir: Optional[str] = None,
can_be_saved: bool = True,
squeeze_binary_classification: bool = True,
force: bool = False,
) -> Module
to_tensorflow_saved_model ¶
to_tensorflow_saved_model(
path: str,
input_model_signature_fn: Any = None,
*,
mode: Literal["keras", "tf"] = "keras",
feature_dtypes: Dict[str, TFDType] = {},
servo_api: bool = False,
feed_example_proto: bool = False,
pre_processing: Optional[Callable] = None,
post_processing: Optional[Callable] = None,
temp_dir: Optional[str] = None,
tensor_specs: Optional[Dict[str, Any]] = None,
feature_specs: Optional[Dict[str, Any]] = None,
force: bool = False
) -> None