Skip to content

Improve training speed

Some hyper-parameters can have a significant impact on both the model quality and the training speed.

Approximated splits

By default, use learning with exact splits. The alternative to exact splits is approximated splits, which are much faster (2x to 5x speed-up depending on the dataset) to learn but can sometimes lead to a drop in quality.

Enable approximated splits with force_numerical_discretization=True. The max_unique_values_for_discretized_numerical (default to 16000) parameter controls the accuracy of the approximate spits. A smaller value will make the algorithm faster, but it may also result in a less accurate spit.

If training time is limited, using approximate splitting and while optimizing other hyperparameters can result in both faster training and improved accuracy.

About other libraries

In XGBoost, approximated splits can be enabled with tree_method="hist".

LightGBM always use approximated splits.

Distributed training

Distributed training divides the computation cost of training a model over multiple computers. In other words, instead of training a model on a single machine, the model is trained on multiple machines in parallel. This can significantly speed up the training process, as well as allow for larger datasets to be used. On small datasets, distributed training does not help.

Number of trees

The training time is directly proportional to the number of trees. Decreasing the number of trees will reduce the training time.

Candidate attribute ratio

Training time is is directly proportional to the num_candidate_attributes_ratio. Decreasing num_candidate_attributes_ratio will reduce the training time.

Disable OOB performances [RF only]

When compute_oob_performances=True (default), the Out-of-bag evaluation is computed during training. OOB evaluation is a great way to measure the quality of a model, but it does not impact training. Disabling compute_oob_performances will speed up Random Forest model training.

Set a maximum training time

maximum_training_duration_seconds controls the maximum training time of a model.

Reduce tested oblique projections

When training a sparse oblique model (split_axis=SPARSE_OBLIQUE), the number of tested projection is defined by num_features^num_projections_exponent. Reducing num_projections_exponent will speed-up training.

Increase number of training threads

The default number of training threads is set to the number of cores on the machine, up to 32. If the machine has more than 32 cores, the number of training threads is limited to 32. In this case, manually setting the num_threads parameter to a larger number can speed up training.

Increase shrinkage [GBT only]

The "shrinkage", sometimes referred to as the "learning rate", determines how quickly a GBT model learns. Learning too quickly typically results in inferior results but produces smaller, faster-to-train, and faster-to-run models. shrinkage defaults to 0.1. You can try 0.15 or event 0.2.