pip install ydf -U
import ydf
import pandas as pd
What is prediction understanding?¶
Prediction understanding aims explain a single model prediction.
Prediction understanding is different from model understanding which aims to explain a model as a whole. To explain models, see the model understanding tutorial instead.
In YDF, prediction understanding is simply done with model.analyze_prediction()
.
Information: Counter-factual analysis is complementary and more complex way to understand predictions. See the counterfactual notebook for details.
Gathering dataset and training model¶
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset"
train_ds = pd.read_csv(f"{ds_path}/adult_train.csv")
test_ds = pd.read_csv(f"{ds_path}/adult_test.csv")
# Print the first 5 training examples
train_ds.head(5)
# Train a model
model = ydf.GradientBoostedTreesLearner(label="income").train(train_ds)
Train model on 22792 examples Model trained in 0:00:01.120417
Prediction analysis¶
In contrast to model analysis (model.analyze
), which examines a model as a whole, prediction analysis (model.analyze_prediction
) explains a single prediction of the model. The next example explains the model's prediction on the first test example.
model.analyze_prediction(test_ds.iloc[:1])
Feature | >50K |
---|---|
capital_gain | -0.987081 ----------------| |
marital_status | -0.703905 -----------| |
education | 0.497574 |++++++++ |
relationship | -0.457539 -------| |
age | 0.361084 |++++++ |
workclass | -0.308857 -----| |
fnlwgt | -0.201710 ---| |
sex | 0.180184 |+++ |
occupation | -0.136641 --| |
education_num | 0.118709 |++ |
hours_per_week | -0.094915 --| |
native_country | 0.037439 |+ |
capital_loss | -0.011167 | |
race | 0.007206 | |
Counterfactual examples¶
Counterfactual examples are the training examples that are most similar to a prediction according to a model. Counterfactual examples can be used to explain the model's prediction by examining the similarities and differences between their features.
For more information, see the standalone counterfactual notebook.