CLI Commands
This page list the commands for the CLI (command line interface) of Yggdrasil Decision Forests.
train: Train a ML model and export it to disk.
Flags from cli/train.cc:
--config (Path to the training configuration i.e. a
model::proto::TrainingConfig text proto.); default: "";
--dataset (Typed path to training dataset i.e. [type]:[path] format. Support
glob, shard and comma. Example: csv:/my/dataset.csv); default: "";
--dataspec (Path to the dataset specification (dataspec). Note: The dataspec
is often created with :infer_dataspec and inspected with :show_dataspec.);
default: "";
--deployment (Path to the deployment configuration for the training i.e.
what computing resources to use to train the model. Text proto buffer of
type model::proto::DeploymentConfig. If not specified, the training is
done locally with a number of threads chosen by the training algorithm.);
default: "";
--output (Output model directory.); default: "";
--valid_dataset (Optional validation dataset specified with [type]:[path]
format. If not specified and if the learning algorithm uses a validation
dataset, the effective validation dataset is extracted from the training
dataset.); default: "";
Try --helpfull to get a list of all flags or --help=substring shows help for
flags which include specified substring in either in the name, or description or
path.
show_model: Display the statistics and structure of a model.
Flags from cli/show_model.cc:
--dataspec (Show the dataspec contained in the model. This is similar as
running :show_dataspec on the data_spec.pb file in the model directory.);
default: false;
--engines (List and test the fast engines compatible with the model. Note:
Engines needs to be linked to the binary. Some engines depend on the
platform e.g. if you don't have AVX2, AVX2 engines won't be listed.);
default: false;
--explain_engine_incompatibility (If true, and if --engines=true, print an
explanation of why each of the available serving engine is not compatible
with the model.); default: false;
--full_definition (Show the full details of the model. For decision forest
models, show the tree structure.); default: false;
--model (Model directory.); default: "";
Try --helpfull to get a list of all flags or --help=substring shows help for
flags which include specified substring in either in the name, or description or
path.
show_dataspec: Print a human readable representation of a dataspec.
Flags from cli/show_dataspec.cc:
--dataspec (Path to dataset specification (dataspec).); default: "";
--is_text_proto (If true, the dataset is read as a text proto. If false, the
dataspec is read as a binary proto.); default: true;
--sort_by_column_names (If true, sort the columns by names. If false, sort
the columns by column index.); default: true;
Try --helpfull to get a list of all flags or --help=substring shows help for
flags which include specified substring in either in the name, or description or
path.
predict: Apply a model on a dataset and export the predictions to disk.
Flags from cli/predict.cc:
--dataset (Typed path to dataset i.e. [type]:[path] format.); default: "";
--key (If set, copies the column "key" in the output prediction file. This
key column cannot be an input feature of the model.); default: "";
--model (Model directory.); default: "";
--num_records_by_shard_in_output (Number of records per output shards. Only
valid if the output path is sharded (e.g. contains @10).); default: -1;
--output (Output prediction specified with [type]:[path] format. e.g.
"csv:/path/to/dataset.csv".); default: "";
Try --helpfull to get a list of all flags or --help=substring shows help for
flags which include specified substring in either in the name, or description or
path.
infer_dataspec: Infers the dataspec of a dataset i.e. the name, type and meta-data of the dataset columns.
Flags from cli/infer_dataspec.cc:
--dataset (Typed path to training dataset i.e. [type]:[path] format.);
default: "";
--guide (Path to an optional dataset specification guide
(DataSpecificationGuide Text proto). Use to override the automatic type
detection of the columns.); default: "";
--output (Output dataspec path.); default: "";
Try --helpfull to get a list of all flags or --help=substring shows help for
flags which include specified substring in either in the name, or description or
path.
evaluate: Evaluates a model.
Flags from cli/evaluate.cc:
--dataset (Typed path to dataset i.e. [type]:[path] format.); default: "";
--model (Model directory.); default: "";
--options (Path to optional evaluation configuration.
proto::EvaluationOptions Text proto.); default: "";
Try --helpfull to get a list of all flags or --help=substring shows help for
flags which include specified substring in either in the name, or description or
path.
convert_dataset: Converts a dataset from one format to another. The dataspec of the dataset should be available.
Flags from cli/convert_dataset.cc:
--dataspec (Input data specification path. This file is generally created
with :infer_dataspec and inspected with :show_dataspec.); default: "";
--dataspec_is_binary (If true, the dataspec is a binary proto. If false
(default), the dataspec is a text proto. The :infer_dataspec cli generates
text proto dataspec, while the dataspec contained in a model is encoded as
a binary proto.); default: false;
--ignore_missing_columns (If false (default), fails if one of the column in
the dataspec is missing. If true, fill missing columns with "missing
values".); default: false;
--input (Input dataset specified with [type]:[path] format.); default: "";
--output (Output dataset specified with [type]:[path] format.); default: "";
--shard_size (Number of record per output shards. Only valid if the output
path is sharded (e.g. contains @10). This flag is required as this
conversion is greedy. If num_records_by_shard is too low, all the
remaining examples will be put in the last shard.); default: -1;
Try --helpfull to get a list of all flags or --help=substring shows help for
flags which include specified substring in either in the name, or description or
path.
benchmark_inference: Benchmarks the inference time of a model with the available inference engines.
Flags from cli/benchmark_inference.cc:
--batch_size (Number of examples per batch. Note that some engine are not
impactedby the batch size.); default: 100;
--dataset (Typed path to dataset i.e. [type]:[path] format.); default: "";
--generic (Evaluates the slow engine i.e. model->predict(). The generic
engine is slow and mostly a reference. Disable it if the benchmark runs
for too long.); default: true;
--model (Path to model.); default: "";
--num_runs (Number of times the dataset is run. Higher values increase the
precision of the timings, but increase the duration of the benchmark.);
default: 20;
--warmup_runs (Number of runs through the dataset before the benchmark.);
default: 1;
Try --helpfull to get a list of all flags or --help=substring shows help for
flags which include specified substring in either in the name, or description or
path.
edit_model: Edits a trained model.
Flags from cli/edit_model.cc:
--input (Input model directory.); default: "__NO__SET__";
--new_file_prefix (New prefix in the filenames.); default: "__NO__SET__";
--new_label_name (New label name.); default: "__NO__SET__";
--new_weights_name (New weights name.); default: "__NO__SET__";
--output (Output model directory.); default: "__NO__SET__";
--pure_serving (Clear the model from any information that is not required
for model serving.This includes debugging, model interpretation and other
meta-data. Can reduce significantly the size of the model.);
default: "__NO__SET__";
Try --helpfull to get a list of all flags or --help=substring shows help for
flags which include specified substring in either in the name, or description or
path.
synthetic_dataset: Create a synthetic dataset.
Flags from cli/utils/synthetic_dataset.cc:
--options (Optional path to text serialized
proto::SyntheticDatasetOptions.); default: "";
--ratio_test (Fraction of the dataset (which size is defined in "options")
is send to the test dataset. The "test" flag can be empty iff.
ratio_valid=0.); default: 0.3;
--ratio_valid (Fraction of the dataset (which size is defined in "options")
is send to the validation dataset. The "valid" flag can be empty iff.
ratio_valid=0.); default: 0;
--test (Optional [type]:[path] path to the output test dataset.);
default: "";
--train ([type]:[path] path to the output training dataset.); default: "";
--valid (Optional [type]:[path] path to the output validation dataset.);
default: "";
Try --helpfull to get a list of all flags or --help=substring shows help for
flags which include specified substring in either in the name, or description or
path.