Yggdrasil’s Gradient Boosted Tree learners use early stopping to prevent overfitting. This page offers an overview of the configuration of early stopping and tradeoffs to consider.
At a high level, early stopping is a common ML technique to evaluate a model’s current performance against a valuation dataset during training. In the case of GBTs, the learner adds one (or multiple) trees with each iteration until the model contains the requested number of trees. With early stopping, the learner periodically (e.g., at the end of each iteration) computes a validation loss at the end of each iteration, which is the model’s loss on the validation dataset. Based on the validation loss, the learner may decide to include fewer trees than requested in the final model1.
Early stopping modes#
Yggdrasil’s early stopping is configured through the
Yggdrasil supports three modes for early stopping, configured through the
NONE: No early stopping is used.
If the dataset is very small or if overfitting is unlikely, disabling early stopping is a sensible choice.
VALIDATION_LOSS_INCREASE: Stops the training when the validation loss stops decreasing.
More precisely, and to account for potential training noise (the validation loss can decrease, increase and then decrease again), training stops if the smallest (best) validation loss was observed more than
early_stopping_num_trees_look_aheadtrees/ training iterations ago.
MIN_VALIDATION_LOSS_ON_FULL_MODEL: Trains all the “num_trees”, and then select the subset \([1,..,k]\) of trees that minimizes the validation loss.
This solution is can lead to better models than
VALIDATION_LOSS_INCREASE, but it is more expensive as all the trees are always trained
VALIDATION_LOSS_INCREASE, the learner stops training when it is
confident that additional trees will no longer decrease the validation loss.
early_stopping_num_trees_look_ahead controls how many trees the
learner will compute to see if they reduce the validation loss.
The default value is 30. Parameter
be increased if the learner is suspected to trigger early stopping prematurely,
but it mildly increases the running time of the algorithm and results in larger
Sometimes, the first few training iterations of training is noisy, and the
validation loss oscillates. To avoid triggering it by accident, early stopping
is disabled during the first
During the first few iterations of the learner, the model can still be very
noisy, with the validation loss making surprising jumps. Parameter
early_stopping_initial_iteration controls how many of the first iterations
will be skipped when computing the validation loss.
The default value is 1. Parameter
early_stopping_initial_iteration should be
increased if the learner is suspected to trigger early stopping prematurely, but
it mildly increases the running time of the algorithm and results in larger
The validation dataset is picked uniformly at random from the training
dataset. The proportion of the validation dataset is controlled with the parameter
validation_set_ratio. The default value is 0.1 that is, the learner will use
10% of the input dataset to form the validation dataset. For small or highly
imbalanced datasets, the validation dataset should be small or early stopping
The split between training and validation datasets can be controlled more
validation_set_group_feature. The value of this parameter
should be the name of a feature or a regular expression resolving to one or more
feature names. Two examples with the same feature value(s) as specified by
validation_set_group_feature cannot be on different sides of the
training/validation split. This is useful to ensure that highly correlated
examples always land on the same side of the split.
The learner evaluates the model on the validation set at every
validation_interval_in_trees trees. Increasing this value will improve the
algorithm’s speed and might lead to slightly worse / larger models.
Note that, despite its name, early stopping may still allow the learner compute all requested trees and only prune the model afterward.