With TF Serving¶

This tutorial demonstrates how to train a YDF model, export it to the TensorFlow SavedModel format, and serve it for online and batch predictions using Google Cloud's Vertex AI. It also covers how to run the model locally with the TensorFlow Serving docker image for testing.

Setup¶

First, let's install the necessary libraries. We need ydf for training and ydf-tf for the export functionality. Note that Tensorflow Decision Forests is no longer needed (starting with ydf 0.15.0).

In [ ]:

Copied!

pip install ydf ydf-tf -U -qq
pip install ydf ydf-tf -U -qq

About this tutorial¶

This tutorial shows how to train a YDF model, export it to the TensorFlow SavedModel format, and run this model in Vertex AI. Additionally, the tutorial shows how to manually run the TensorFlow Serving binary to make inferences with the model.

What is TF Serving?¶

TensorFlow Serving is a production environment for running machine learning models. TensorFlow Serving can run YDF models.

What is Vertex AI?¶

Vertex AI is a Google Cloud solution to manage and serve ML models. Vertex AI relies on TensorFlow Serving to run TensorFlow models (stored in the TensorFlow SavedModel format). YDF models can be exported to TensorFlow SavedModel and run on TensorFlow Serving and Vertex AI with the model.to_tensorflow_saved_model method.

Important remark about TensorFlow Saved Model inputs¶

TensorFlow Saved Model can be seen as a generic function that takes data as input and produces predictions as output. TensorFlow Serving and Vertex AI define three input formats for feeding input features and three output formats for retrieving predictions. Using the incorrect format can results in cryptic error messages. Understanding these formats is not necessary for using TensorFlow Serving and Vertex AI, but it can be helpful for debugging your pipeline. This section provides an overview of the formats.

YDF allows you to select the type of format of your model using the servo_api: bool and feed_example_proto: bool argument of the to_tensorflow_saved_model function.

Input format #1: input instances

In this format, the data is grouped by examples, where each example is a dictionary of features in a list. The format is straightforward but not very efficient. This format is easily usable with all APIs (REST, Python, C++). This format is used by Vertex AI for Online predictions and by Vertex AI Batch predictions on jsonl.

Here is an a list of examples, each having 3 features "f1", "f2", and "f3":

[ {"f1": 1, "f2": 5.9, "f3": "red" }, {"f1": 3, "f2": 2.1, "f3": "blue" } ]

This is the default input format of the to_tensorflow_saved_model function i.e. feed_example_proto=False.

Input format #2: input feature

In this format, the data is grouped by features, where each feature is a list of values in a dictionary. The format is relatively straightforward and the most efficient. This format is easily usable with all APIs (REST, Python, C++). When possible, this is the format to use.

Here is the same example in this format:

{"f1": [1, 3], "f2": [5.9, 2.1], "f3": ["red", "blue"] }

This is also the default input format of the to_tensorflow_saved_model function i.e. feed_example_proto=False.

Input format #3: serialized tensorflow examples

In this last format, the data is encoded as a Google Protobuf of TensorFlow Example protos, which is also the format used to train large TensorFlow pipelines. This format is not efficient for serving and relatively complex to use. When possible, try to avoid using it for inference. This format is required by Vertex AI Batch predictions on TensorFlow Records files.

This format is enabled with feed_example_proto=True in the to_tensorflow_saved_model function.

Output format #1: predict

This format is the simplest and most efficient one. The predicted value is directly outputted by the model. The meaning of the predictions is determined by the model. A classification model, for instance, will output probabilities, whereas a regression model will output values. This is the default output format of the to_tensorflow_saved_model function i.e. servo_api=False.

Here is an example of prediction for a binary classification model:

{ "prediction": [0.2, 0.7] }

Here is an example of prediction for a multi-class classification model:

{ "prediction": [[0.2, 0.1, 0.7],
                  [0.8, 0.1, 0.1]] }

This format is available for Vertex AI Online predictions.

Output format #2 & 3: classify and regress

In those formats, the model outputs a dictionary of values. The values depend on the model type. For instance, a classification model will output a "score" and a "labels" value. This output format is enabled with servo_api=True in the to_tensorflow_saved_model function.

This format is available for Vertex AI Online predictions, and required for Vertex AI Batch predictions.

Here is an example of prediction for a binary-classification model:

{ "scores":  [[0.2, 0.8],
             [0.1, 0.9]],
  "classes": [["v1", "v2"],
             ["v1", "v2"]]}

Train a model¶

We train a binary classification YDF model similarly to the classification tutorial.

We load a dataset:

In [ ]:

Copied!





# Load libraries
import ydf  # Yggdrasil Decision Forests
import pandas as pd  # We use Pandas to load small datasets

# Download a classification dataset and load it as a Pandas DataFrame.
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset"
train_ds = pd.read_csv(f"{ds_path}/adult_train.csv")
test_ds = pd.read_csv(f"{ds_path}/adult_test.csv")

# Print the first 5 training examples
train_ds.head(5)
# Load libraries
import ydf  # Yggdrasil Decision Forests
import pandas as pd  # We use Pandas to load small datasets

# Download a classification dataset and load it as a Pandas DataFrame.
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset"
train_ds = pd.read_csv(f"{ds_path}/adult_train.csv")
test_ds = pd.read_csv(f"{ds_path}/adult_test.csv")

# Print the first 5 training examples
train_ds.head(5)

We train the model:

In [ ]:

Copied!

model = ydf.GradientBoostedTreesLearner(label="income").train(train_ds)
model = ydf.GradientBoostedTreesLearner(label="income").train(train_ds)

Note: While not demonstrated here, it is recommended to look at and evaluate a model before putting it into use. Use the model.describe() method to examine the model's structure and characteristics, and use the model.evaluate(...) method to assess its performance and accuracy. See the Getting started tutorial for details.

Export model to TF Saved Model format¶

TensorFlow Serving can only read models in the TensorFlow SavedModel format. Therefore, we export the YDF model to the TensorFlow SavedModel format.

This step requires the YDF-TF library to be installed.

Info: TensorFlow Decision Forests is no longer required for export to TensorFlow.

In [ ]:

Copied!

!pip install ydf-tf -qq
!pip install ydf-tf -qq

In [ ]:

Copied!





# Export the model to a TensorFlow Saved Model.

# For Vertex AI Online inference.
# The model consumes raw features values, and output raw model predictions.
model.to_tensorflow_saved_model("/tmp/ydf/tf_model", mode="tf")

# For Vertex AI Batch inference.
# The model consumes TensorFlow Example protos, and returns a dictionary.
# model.to_tensorflow_saved_model("/tmp/ydf/tf_model", mode="tf", servo_api=True, feed_example_proto=True)
# Export the model to a TensorFlow Saved Model.

# For Vertex AI Online inference.
# The model consumes raw features values, and output raw model predictions.
model.to_tensorflow_saved_model("/tmp/ydf/tf_model", mode="tf")

# For Vertex AI Batch inference.
# The model consumes TensorFlow Example protos, and returns a dictionary.
# model.to_tensorflow_saved_model("/tmp/ydf/tf_model", mode="tf", servo_api=True, feed_example_proto=True)

Import the model in Vertex AI¶

To import the model in Vertex AI, follow those steps:

1. Import the TensorFlow Saved Model into a Google Cloud Bucket

Open the Cloud Storage page.
Create a new bucket or select an existing one.
Click on "Upload folder" and select the model exported previously, which is /tmp/ydf/tf_model in this example.

The model bucket should contain a file called saved_model.pb. For example, if you upload the model to the gs://my_bucket bucket, the file gs://ydf_model_2/tf_model/saved_model.pb should be present.

2. Register model in Vertex AI

Open to the Vertex AI Model Registry page.
Click "Import" and select the model from the cloud bucket.
In the "Import Model" dialog, configure the following options:
- Name: Enter a name for the model.
- Model framework: Select TensorFlow.
- Model framework version: Select the most recent version, which is 2.13 at the time of this writing.
- Accelerator: Select None.
- Model artifact location: Specify the Google Cloud Storage (GCS) path to the model artifacts, e.g., gs://my_bucket/tf_model/.
- Use optimized TensorFlow runtime: Disable this field. Decision Forests do not work with this neural network specific optimization.
Leave the other options with their default values and click "Continue."
When prompted about explainability, select "No explainability" and click "Import."

The model will be imported in a few minutes. You can monitor the progress at the top-right corner of the Model Registry page. Once imported, the model will appear in the list of registered models.

The model is now ready for inference.

Online predictions¶

The model can be deployed to an "endpoint" and queried remotely via the Cloud REST API.

Open the Cloud Model Registry page.
Select the model, open the tab "Deploy and test" and click on "Deploy to endpoint".
Configure the end point as follow:
- Endpoint name: Enter a name for the endpoint.
- Machine type: Select the smallest possible machine e.g. n1-standard-2
Click on "Deploy"

The endpoint is now being deployed.

In the "Test your model" section, query the model with the following JSON request:

{
   "instances":[
      {
         "age":39,
         "workclass":"State-gov",
         "fnlwgt":77516,
         "education":"Bachelors",
         "education_num":13,
         "marital_status":"Never-married",
         "occupation":"Adm-clerical",
         "relationship":"Not-in-family",
         "race":"White",
         "sex":"Male",
         "capital_gain":2174,
         "capital_loss":0,
         "hours_per_week":40,
         "native_country":"United-States"
      }
   ]
}

The result will be:

{
 "predictions": [
   0.0186043456
 ],
 "deployedModelId": "2255069761266253824",
 "model": "projects/734980258708/locations/us-central1/models/8572427068350922752",
 "modelDisplayName": "tf_model_servoF_protoF",
 "modelVersionId": "1"
}

This predictions indicates that the model that the positive class has 1.66% chance to be true. In other words, the model predictions the negative class.

In [ ]:

Copied!

print("The positive and negative classes are:", model.label_classes())
print("The positive and negative classes are:", model.label_classes())

Batch predictions¶

Now, let's perform batch predictions with the model. This involves uploading a file containing instances, generating predictions, and retrieving the results in a JSON file.

We loaded the training dataset from a CSV file. However, using CSV files with TensorFlow SavedModel can lead to errors. To avoid issues, we'll use TensorFlow's official format for datasets of examples, a TFRecord file of tf.train.Example protobufs. Fortunately, the test dataset is readily available in this format here.

Download the "adult_test.recordio.gz" on your computer.
Open the Cloud Storage page.
In the bucket you already created, click on "Upload file" and select the file adult_test.recordio.gz.
In the Cloud storage page, select the file and rename it adult_test.tfrecord.gz.
- Vertex AI detect file format by extension.
Open the Cloud Model Registry page.
Select the model, open the tab "Batch predict" and click on "Create batch prediction".
Configure the batch prediction as follow:
- Batch prediction name: Enter a name for the predictions.
- File on Cloud Storage (JSONL, CSV, TFRecord, and TFRecord (GZIP)): Select the adult_test.tfrecord.gz file.
- Destination path: Select the bucket containing the model and the dataset.
- Number of compute nodes: Enter "1". This is a small dataset that does not require much power.
- Machine type: Select the smallest possible machine e.g. n1-standard-2
Click on "Create"

The predictions are currently being calculated. When the computation is complete, you will find them in a newly created JSON file located in your bucket.

Here are the first five lines of the generated file:

{"prediction": {"scores": [0.981395662, 0.0186043456], "classes": ["<=50K", ">50K"]}}
{"prediction": {"scores": [0.638690472, 0.361309558], "classes": ["<=50K", ">50K"]}}
{"prediction": {"scores": [0.161411345, 0.838588655], "classes": ["<=50K", ">50K"]}}
{"prediction": {"scores": [0.956144333, 0.0438556746], "classes": ["<=50K", ">50K"]}}
{"prediction": {"scores": [0.970823526, 0.0291764941], "classes": ["<=50K", ">50K"]}}

You can see the predicted probability for each example and each class. For instance, on the first example, the model predicts class "<=50K" with a probability of 98.14%.

Run the model in local TensorFlow Serving [Advanced]¶

It is possible to start the TensorFlow Serving binary locally or on a remote machine, and send request with the TensorFlow Serving REST API. Let's show how it is done:

To test our model, we start a local version of TF Serving following the tf serving setup instructions.

In a separate terminal, type:

cd /tmp/ydf

docker run -t --rm -p 8501:8501 \
    -v /tmp/tf/tf_model:/models/my_saved_model/1 \
    -e MODEL_NAME=my_saved_model \
    tensorflow/serving

Note: TensorFlow Serving expects the model path to follow the structure: models/<MODEL_NAME>/<VERSION>

Once the TensorFlow Serving server is up and running, you can send prediction requests. Here is an example of each:

Predictions with the input instances format:

In [ ]:

Copied!

!curl http://localhost:8501/v1/models/my_saved_model:predict -X POST \
    -d '{"instances": [{"age":39,"workclass":"State-gov","fnlwgt":77516,"education":"Bachelors","education_num":13,"marital_status":"Never-married","occupation":"Adm-clerical","relationship":"Not-in-family","race":"White","sex":"Male","capital_gain":2174,"capital_loss":0,"hours_per_week":40,"native_country":"United-States"}]}'
!curl http://localhost:8501/v1/models/my_saved_model:predict -X POST \
    -d '{"instances": [{"age":39,"workclass":"State-gov","fnlwgt":77516,"education":"Bachelors","education_num":13,"marital_status":"Never-married","occupation":"Adm-clerical","relationship":"Not-in-family","race":"White","sex":"Male","capital_gain":2174,"capital_loss":0,"hours_per_week":40,"native_country":"United-States"}]}'

Predictions with the input features format:

In [ ]:

Copied!

!curl http://localhost:8501/v1/models/my_saved_model:predict -X POST \
    -d '{"inputs": {"age":[39],"workclass":["State-gov"],"fnlwgt":[77516],"education":["Bachelors"],"education_num":[13],"marital_status":["Never-married"],"occupation":["Adm-clerical"],"relationship":["Not-in-family"],"race":["White"],"sex":["Male"],"capital_gain":[2174],"capital_loss":[0],"hours_per_week":[40],"native_country":["United-States"]} }'
!curl http://localhost:8501/v1/models/my_saved_model:predict -X POST \
    -d '{"inputs": {"age":[39],"workclass":["State-gov"],"fnlwgt":[77516],"education":["Bachelors"],"education_num":[13],"marital_status":["Never-married"],"occupation":["Adm-clerical"],"relationship":["Not-in-family"],"race":["White"],"sex":["Male"],"capital_gain":[2174],"capital_loss":[0],"hours_per_week":[40],"native_country":["United-States"]} }'