Monotonic¶
Monotonic constraints force a monotonic relationship between features and model predictions. For instance, we might want the model output to always increase when a specific feature value increases. Monotonicity is imposed with the features
argument.
Note: Not all learners support monotonic constraints.
Let's train a model on the Adult census dataset with monotonic increasing constraints on features age
and hours_per_week
, i.e. let's force to model to predict increasing income with increasing age and increasing hours worked per week (all other features being constant).
import ydf
import pandas as pd
# Download a classification dataset and load it as a Pandas DataFrame.
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset"
train_ds = pd.read_csv(f"{ds_path}/adult_train.csv")
test_ds = pd.read_csv(f"{ds_path}/adult_test.csv")
# Print the first 5 training examples
train_ds.head(5)
age | workclass | fnlwgt | education | education_num | marital_status | occupation | relationship | race | sex | capital_gain | capital_loss | hours_per_week | native_country | income | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 44 | Private | 228057 | 7th-8th | 4 | Married-civ-spouse | Machine-op-inspct | Wife | White | Female | 0 | 0 | 40 | Dominican-Republic | <=50K |
1 | 20 | Private | 299047 | Some-college | 10 | Never-married | Other-service | Not-in-family | White | Female | 0 | 0 | 20 | United-States | <=50K |
2 | 40 | Private | 342164 | HS-grad | 9 | Separated | Adm-clerical | Unmarried | White | Female | 0 | 0 | 37 | United-States | <=50K |
3 | 30 | Private | 361742 | Some-college | 10 | Married-civ-spouse | Exec-managerial | Husband | White | Male | 0 | 0 | 50 | United-States | <=50K |
4 | 67 | Self-emp-inc | 171564 | HS-grad | 9 | Married-civ-spouse | Prof-specialty | Wife | White | Female | 20051 | 0 | 30 | England | >50K |
model = ydf.GradientBoostedTreesLearner(label="income",
features=[
ydf.Feature("age", monotonic=+1),
ydf.Feature("hours_per_week", monotonic=+1),
],
include_all_columns=True,
use_hessian_gain=True,
).train(train_ds)
Train model on 22792 examples Model trained in 0:00:02.326853
To verify that a model is monotonic with respect to a feature, we can examine the Partial Dependence Plot (PDP). In the PDP tab, the curves of age
and hours_per_week
are monotonically increasing.
model.analyze(test_ds, sampling=0.1)
Variable importances measure the importance of an input feature for a model.
1. "capital_gain" 0.051592 ################ 2. "marital_status" 0.043403 ############# 3. "education_num" 0.030505 ######### 4. "age" 0.017197 ##### 5. "capital_loss" 0.013512 #### 6. "occupation" 0.010953 ### 7. "hours_per_week" 0.006551 ## 8. "workclass" 0.003173 # 9. "race" 0.001024 10. "fnlwgt" 0.000921 11. "relationship" 0.000921 12. "native_country" 0.000921 13. "education" -0.001433 14. "sex" -0.001638
1. "capital_gain" 0.234571 ################ 2. "marital_status" 0.126207 ######## 3. "age" 0.062733 #### 4. "education_num" 0.061874 #### 5. "capital_loss" 0.045245 ### 6. "occupation" 0.025731 # 7. "hours_per_week" 0.017928 # 8. "relationship" 0.007549 9. "workclass" 0.005649 10. "sex" 0.002152 11. "fnlwgt" 0.001415 12. "race" 0.001161 13. "education" 0.000641 14. "native_country" 0.000496
1. "marital_status" 0.078686 ################ 2. "capital_gain" 0.060498 ############ 3. "age" 0.056163 ########### 4. "education_num" 0.030999 ###### 5. "capital_loss" 0.014430 ## 6. "hours_per_week" 0.012948 ## 7. "occupation" 0.012506 ## 8. "relationship" 0.005170 9. "workclass" 0.002254 10. "sex" 0.001692 11. "race" 0.000560 12. "native_country" 0.000364 13. "education" 0.000331 14. "fnlwgt" 0.000270
1. "capital_gain" 0.234376 ################ 2. "marital_status" 0.126181 ######## 3. "age" 0.062723 #### 4. "education_num" 0.061858 #### 5. "capital_loss" 0.045227 ### 6. "occupation" 0.025721 # 7. "hours_per_week" 0.017925 # 8. "relationship" 0.007542 9. "workclass" 0.005649 10. "sex" 0.002149 11. "fnlwgt" 0.001409 12. "race" 0.001158 13. "education" 0.000638 14. "native_country" 0.000495
1. "capital_gain" 0.240106 ################ 2. "fnlwgt" 0.228937 ############# 3. "education_num" 0.221243 ########### 4. "occupation" 0.204376 ####### 5. "age" 0.202566 ###### 6. "marital_status" 0.201923 ###### 7. "capital_loss" 0.199952 ###### 8. "relationship" 0.191252 ### 9. "native_country" 0.189980 ### 10. "hours_per_week" 0.189106 ### 11. "workclass" 0.186076 ## 12. "race" 0.179428 13. "sex" 0.176670 14. "education" 0.175650
1. "capital_gain" 34.000000 ################ 2. "marital_status" 27.000000 ############ 3. "occupation" 20.000000 ######## 4. "capital_loss" 19.000000 ####### 5. "relationship" 15.000000 ##### 6. "age" 14.000000 #### 7. "fnlwgt" 14.000000 #### 8. "education_num" 13.000000 #### 9. "native_country" 11.000000 ### 10. "workclass" 9.000000 ## 11. "hours_per_week" 6.000000 12. "race" 5.000000
1. "fnlwgt" 1283.000000 ################ 2. "capital_gain" 705.000000 ######## 3. "education_num" 618.000000 ####### 4. "capital_loss" 426.000000 ##### 5. "age" 384.000000 #### 6. "hours_per_week" 283.000000 ### 7. "occupation" 243.000000 ## 8. "workclass" 128.000000 # 9. "relationship" 114.000000 # 10. "marital_status" 106.000000 # 11. "native_country" 103.000000 12. "education" 61.000000 13. "sex" 37.000000 14. "race" 27.000000
1. "marital_status" 487568319.598255 ################ 2. "education_num" 343622892.392935 ########### 3. "capital_gain" 320171507.632499 ########## 4. "hours_per_week" 153983141.770836 ##### 5. "capital_loss" 146585694.670481 #### 6. "age" 98337013.334848 ### 7. "occupation" 52018124.395105 # 8. "fnlwgt" 14683666.270136 9. "sex" 7973337.944597 10. "workclass" 7065720.846374 11. "relationship" 6549757.592968 12. "native_country" 3384842.107984 13. "race" 1127991.045659 14. "education" 1047237.942327
For comparison, let's train the same model without the monotonic constraints and compare the partial dependence plots.
Here, the PDP of age
and hours_per_week
is not monotonic. For example, for ages greater than 60, the model output decreases as age increases.
model = ydf.GradientBoostedTreesLearner(label="income",
use_hessian_gain=True,
).train(train_ds)
model.analyze(test_ds, sampling=0.1)
Train model on 22792 examples Model trained in 0:00:02.118608
Variable importances measure the importance of an input feature for a model.
1. "capital_gain" 0.055789 ################ 2. "marital_status" 0.044733 ############ 3. "education_num" 0.028150 ######## 4. "occupation" 0.019040 ##### 5. "age" 0.015867 #### 6. "capital_loss" 0.015457 #### 7. "hours_per_week" 0.003276 # 8. "workclass" 0.003173 # 9. "fnlwgt" 0.001331 10. "education" 0.000819 11. "relationship" 0.000512 12. "native_country" 0.000102 13. "sex" -0.000102 14. "race" -0.001126
1. "capital_gain" 0.250679 ################ 2. "marital_status" 0.132571 ######## 3. "age" 0.062733 #### 4. "education_num" 0.059362 ### 5. "capital_loss" 0.045224 ## 6. "occupation" 0.031333 ## 7. "hours_per_week" 0.016096 # 8. "relationship" 0.004431 9. "workclass" 0.004119 10. "sex" 0.002572 11. "fnlwgt" 0.001708 12. "education" 0.000919 13. "native_country" 0.000418 14. "race" -0.000084
1. "marital_status" 0.087207 ################ 2. "capital_gain" 0.064457 ########### 3. "age" 0.058184 ########## 4. "education_num" 0.027934 ##### 5. "occupation" 0.015009 ## 6. "capital_loss" 0.014695 ## 7. "hours_per_week" 0.010852 # 8. "relationship" 0.003060 9. "sex" 0.002069 10. "workclass" 0.001361 11. "fnlwgt" 0.000469 12. "education" 0.000332 13. "native_country" 0.000159 14. "race" 0.000068
1. "capital_gain" 0.250493 ################ 2. "marital_status" 0.132549 ######## 3. "age" 0.062720 #### 4. "education_num" 0.059350 ### 5. "capital_loss" 0.045208 ## 6. "occupation" 0.031326 ## 7. "hours_per_week" 0.016094 # 8. "relationship" 0.004429 9. "workclass" 0.004118 10. "sex" 0.002570 11. "fnlwgt" 0.001708 12. "education" 0.000918 13. "native_country" 0.000417 14. "race" -0.000085
1. "age" 0.262588 ################ 2. "capital_gain" 0.226122 ######### 3. "education_num" 0.217170 ####### 4. "fnlwgt" 0.210966 ###### 5. "hours_per_week" 0.208267 ###### 6. "marital_status" 0.202346 ##### 7. "occupation" 0.201522 ##### 8. "capital_loss" 0.198465 #### 9. "native_country" 0.181994 # 10. "relationship" 0.181866 # 11. "workclass" 0.179705 # 12. "race" 0.176878 13. "sex" 0.173925 14. "education" 0.172549
1. "age" 38.000000 ################ 2. "capital_gain" 23.000000 ######### 3. "marital_status" 21.000000 ######## 4. "capital_loss" 18.000000 ####### 5. "occupation" 17.000000 ###### 6. "education_num" 16.000000 ###### 7. "hours_per_week" 11.000000 #### 8. "fnlwgt" 6.000000 # 9. "native_country" 6.000000 # 10. "relationship" 5.000000 # 11. "race" 3.000000 12. "workclass" 2.000000
1. "fnlwgt" 920.000000 ################ 2. "age" 850.000000 ############## 3. "hours_per_week" 539.000000 ######### 4. "capital_gain" 493.000000 ######## 5. "education_num" 439.000000 ####### 6. "capital_loss" 374.000000 ###### 7. "occupation" 186.000000 ## 8. "marital_status" 129.000000 # 9. "workclass" 110.000000 # 10. "relationship" 107.000000 # 11. "native_country" 81.000000 12. "education" 34.000000 13. "sex" 34.000000 14. "race" 28.000000
1. "marital_status" 484346054.435797 ################ 2. "education_num" 342841906.363874 ########### 3. "capital_gain" 314297945.114876 ########## 4. "hours_per_week" 156279266.529111 ##### 5. "capital_loss" 136981976.118004 #### 6. "age" 124340810.030382 #### 7. "occupation" 47180431.518591 # 8. "fnlwgt" 11123014.673219 9. "workclass" 6312540.338696 10. "relationship" 6128030.124364 11. "sex" 5356973.172271 12. "native_country" 3569206.429949 13. "race" 1483551.225928 14. "education" 828077.075970