Monotonic¶
Monotonic constraints force a monotonic relationship between features and model predictions. For instance, we might want the model output to always increase when a specific feature value increases. Monotonicity is imposed with the features argument.
Note: Not all learners support monotonic constraints.
Let's train a model on the Adult census dataset with monotonic increasing constraints on features age and hours_per_week, i.e. let's force to model to predict increasing income with increasing age and increasing hours worked per week (all other features being constant).
import ydf
import pandas as pd
# Download a classification dataset and load it as a Pandas DataFrame.
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset"
train_ds = pd.read_csv(f"{ds_path}/adult_train.csv")
test_ds = pd.read_csv(f"{ds_path}/adult_test.csv")
# Print the first 5 training examples
train_ds.head(5)
| age | workclass | fnlwgt | education | education_num | marital_status | occupation | relationship | race | sex | capital_gain | capital_loss | hours_per_week | native_country | income | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 44 | Private | 228057 | 7th-8th | 4 | Married-civ-spouse | Machine-op-inspct | Wife | White | Female | 0 | 0 | 40 | Dominican-Republic | <=50K |
| 1 | 20 | Private | 299047 | Some-college | 10 | Never-married | Other-service | Not-in-family | White | Female | 0 | 0 | 20 | United-States | <=50K |
| 2 | 40 | Private | 342164 | HS-grad | 9 | Separated | Adm-clerical | Unmarried | White | Female | 0 | 0 | 37 | United-States | <=50K |
| 3 | 30 | Private | 361742 | Some-college | 10 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 50 | United-States | <=50K |
| 4 | 67 | Self-emp-inc | 171564 | HS-grad | 9 | Married-civ-spouse | Prof-specialty | Wife | White | Female | 20051 | 0 | 30 | England | >50K |
model = ydf.GradientBoostedTreesLearner(label="income",
features=[
ydf.Feature("age", monotonic=+1),
ydf.Feature("hours_per_week", monotonic=+1),
],
include_all_columns=True,
use_hessian_gain=True,
).train(train_ds)
Train model on 22792 examples Model trained in 0:00:02.326853
To verify that a model is monotonic with respect to a feature, we can examine the Partial Dependence Plot (PDP). In the PDP tab, the curves of age and hours_per_week are monotonically increasing.
model.analyze(test_ds, sampling=0.1)
Variable importances measure the importance of an input feature for a model.
1. "capital_gain" 0.051592 ################
2. "marital_status" 0.043403 #############
3. "education_num" 0.030505 #########
4. "age" 0.017197 #####
5. "capital_loss" 0.013512 ####
6. "occupation" 0.010953 ###
7. "hours_per_week" 0.006551 ##
8. "workclass" 0.003173 #
9. "race" 0.001024
10. "fnlwgt" 0.000921
11. "relationship" 0.000921
12. "native_country" 0.000921
13. "education" -0.001433
14. "sex" -0.001638
1. "capital_gain" 0.234571 ################
2. "marital_status" 0.126207 ########
3. "age" 0.062733 ####
4. "education_num" 0.061874 ####
5. "capital_loss" 0.045245 ###
6. "occupation" 0.025731 #
7. "hours_per_week" 0.017928 #
8. "relationship" 0.007549
9. "workclass" 0.005649
10. "sex" 0.002152
11. "fnlwgt" 0.001415
12. "race" 0.001161
13. "education" 0.000641
14. "native_country" 0.000496
1. "marital_status" 0.078686 ################
2. "capital_gain" 0.060498 ############
3. "age" 0.056163 ###########
4. "education_num" 0.030999 ######
5. "capital_loss" 0.014430 ##
6. "hours_per_week" 0.012948 ##
7. "occupation" 0.012506 ##
8. "relationship" 0.005170
9. "workclass" 0.002254
10. "sex" 0.001692
11. "race" 0.000560
12. "native_country" 0.000364
13. "education" 0.000331
14. "fnlwgt" 0.000270
1. "capital_gain" 0.234376 ################
2. "marital_status" 0.126181 ########
3. "age" 0.062723 ####
4. "education_num" 0.061858 ####
5. "capital_loss" 0.045227 ###
6. "occupation" 0.025721 #
7. "hours_per_week" 0.017925 #
8. "relationship" 0.007542
9. "workclass" 0.005649
10. "sex" 0.002149
11. "fnlwgt" 0.001409
12. "race" 0.001158
13. "education" 0.000638
14. "native_country" 0.000495
1. "capital_gain" 0.240106 ################
2. "fnlwgt" 0.228937 #############
3. "education_num" 0.221243 ###########
4. "occupation" 0.204376 #######
5. "age" 0.202566 ######
6. "marital_status" 0.201923 ######
7. "capital_loss" 0.199952 ######
8. "relationship" 0.191252 ###
9. "native_country" 0.189980 ###
10. "hours_per_week" 0.189106 ###
11. "workclass" 0.186076 ##
12. "race" 0.179428
13. "sex" 0.176670
14. "education" 0.175650
1. "capital_gain" 34.000000 ################
2. "marital_status" 27.000000 ############
3. "occupation" 20.000000 ########
4. "capital_loss" 19.000000 #######
5. "relationship" 15.000000 #####
6. "age" 14.000000 ####
7. "fnlwgt" 14.000000 ####
8. "education_num" 13.000000 ####
9. "native_country" 11.000000 ###
10. "workclass" 9.000000 ##
11. "hours_per_week" 6.000000
12. "race" 5.000000
1. "fnlwgt" 1283.000000 ################
2. "capital_gain" 705.000000 ########
3. "education_num" 618.000000 #######
4. "capital_loss" 426.000000 #####
5. "age" 384.000000 ####
6. "hours_per_week" 283.000000 ###
7. "occupation" 243.000000 ##
8. "workclass" 128.000000 #
9. "relationship" 114.000000 #
10. "marital_status" 106.000000 #
11. "native_country" 103.000000
12. "education" 61.000000
13. "sex" 37.000000
14. "race" 27.000000
1. "marital_status" 487568319.598255 ################
2. "education_num" 343622892.392935 ###########
3. "capital_gain" 320171507.632499 ##########
4. "hours_per_week" 153983141.770836 #####
5. "capital_loss" 146585694.670481 ####
6. "age" 98337013.334848 ###
7. "occupation" 52018124.395105 #
8. "fnlwgt" 14683666.270136
9. "sex" 7973337.944597
10. "workclass" 7065720.846374
11. "relationship" 6549757.592968
12. "native_country" 3384842.107984
13. "race" 1127991.045659
14. "education" 1047237.942327
For comparison, let's train the same model without the monotonic constraints and compare the partial dependence plots.
Here, the PDP of age and hours_per_week is not monotonic. For example, for ages greater than 60, the model output decreases as age increases.
model = ydf.GradientBoostedTreesLearner(label="income",
use_hessian_gain=True,
).train(train_ds)
model.analyze(test_ds, sampling=0.1)
Train model on 22792 examples Model trained in 0:00:02.118608
Variable importances measure the importance of an input feature for a model.
1. "capital_gain" 0.055789 ################
2. "marital_status" 0.044733 ############
3. "education_num" 0.028150 ########
4. "occupation" 0.019040 #####
5. "age" 0.015867 ####
6. "capital_loss" 0.015457 ####
7. "hours_per_week" 0.003276 #
8. "workclass" 0.003173 #
9. "fnlwgt" 0.001331
10. "education" 0.000819
11. "relationship" 0.000512
12. "native_country" 0.000102
13. "sex" -0.000102
14. "race" -0.001126
1. "capital_gain" 0.250679 ################
2. "marital_status" 0.132571 ########
3. "age" 0.062733 ####
4. "education_num" 0.059362 ###
5. "capital_loss" 0.045224 ##
6. "occupation" 0.031333 ##
7. "hours_per_week" 0.016096 #
8. "relationship" 0.004431
9. "workclass" 0.004119
10. "sex" 0.002572
11. "fnlwgt" 0.001708
12. "education" 0.000919
13. "native_country" 0.000418
14. "race" -0.000084
1. "marital_status" 0.087207 ################
2. "capital_gain" 0.064457 ###########
3. "age" 0.058184 ##########
4. "education_num" 0.027934 #####
5. "occupation" 0.015009 ##
6. "capital_loss" 0.014695 ##
7. "hours_per_week" 0.010852 #
8. "relationship" 0.003060
9. "sex" 0.002069
10. "workclass" 0.001361
11. "fnlwgt" 0.000469
12. "education" 0.000332
13. "native_country" 0.000159
14. "race" 0.000068
1. "capital_gain" 0.250493 ################
2. "marital_status" 0.132549 ########
3. "age" 0.062720 ####
4. "education_num" 0.059350 ###
5. "capital_loss" 0.045208 ##
6. "occupation" 0.031326 ##
7. "hours_per_week" 0.016094 #
8. "relationship" 0.004429
9. "workclass" 0.004118
10. "sex" 0.002570
11. "fnlwgt" 0.001708
12. "education" 0.000918
13. "native_country" 0.000417
14. "race" -0.000085
1. "age" 0.262588 ################
2. "capital_gain" 0.226122 #########
3. "education_num" 0.217170 #######
4. "fnlwgt" 0.210966 ######
5. "hours_per_week" 0.208267 ######
6. "marital_status" 0.202346 #####
7. "occupation" 0.201522 #####
8. "capital_loss" 0.198465 ####
9. "native_country" 0.181994 #
10. "relationship" 0.181866 #
11. "workclass" 0.179705 #
12. "race" 0.176878
13. "sex" 0.173925
14. "education" 0.172549
1. "age" 38.000000 ################
2. "capital_gain" 23.000000 #########
3. "marital_status" 21.000000 ########
4. "capital_loss" 18.000000 #######
5. "occupation" 17.000000 ######
6. "education_num" 16.000000 ######
7. "hours_per_week" 11.000000 ####
8. "fnlwgt" 6.000000 #
9. "native_country" 6.000000 #
10. "relationship" 5.000000 #
11. "race" 3.000000
12. "workclass" 2.000000
1. "fnlwgt" 920.000000 ################
2. "age" 850.000000 ##############
3. "hours_per_week" 539.000000 #########
4. "capital_gain" 493.000000 ########
5. "education_num" 439.000000 #######
6. "capital_loss" 374.000000 ######
7. "occupation" 186.000000 ##
8. "marital_status" 129.000000 #
9. "workclass" 110.000000 #
10. "relationship" 107.000000 #
11. "native_country" 81.000000
12. "education" 34.000000
13. "sex" 34.000000
14. "race" 28.000000
1. "marital_status" 484346054.435797 ################
2. "education_num" 342841906.363874 ###########
3. "capital_gain" 314297945.114876 ##########
4. "hours_per_week" 156279266.529111 #####
5. "capital_loss" 136981976.118004 ####
6. "age" 124340810.030382 ####
7. "occupation" 47180431.518591 #
8. "fnlwgt" 11123014.673219
9. "workclass" 6312540.338696
10. "relationship" 6128030.124364
11. "sex" 5356973.172271
12. "native_country" 3569206.429949
13. "race" 1483551.225928
14. "education" 828077.075970