Regression¶

Setup¶

In [ ]:

Copied!

pip install ydf -U
pip install ydf -U

What is regression?¶

Regression, is the task of predicting a numerical value, such as a tally, a measure, or a quantity. For instance, predicting the age of an animal or the cost of a product are regression problems. By default, the output of a regression model is the expected value, that is, the value that minimizes the squared error. Regression labels can be integers or float values.

Training a regression model¶

The task of a model (e.g., classification, regression, ranking, uplifting) is determined by the learner argument task.

In [ ]:

Copied!





# Load libraries
import ydf  # Yggdrasil Decision Forests
import pandas as pd  # We use Pandas to load small datasets

# Download a classification dataset and load it as a Pandas DataFrame.
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset"
all_ds = pd.read_csv(f"{ds_path}/abalone.csv")

# Randomly split the dataset into a training (70%) and testing (30%) dataset
all_ds = all_ds.sample(frac=1)
split_idx = len(all_ds) * 7 // 10
train_ds = all_ds.iloc[:split_idx]
test_ds = all_ds.iloc[split_idx:]

# Print the first 5 training examples
train_ds.head(5)
# Load libraries
import ydf  # Yggdrasil Decision Forests
import pandas as pd  # We use Pandas to load small datasets

# Download a classification dataset and load it as a Pandas DataFrame.
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset"
all_ds = pd.read_csv(f"{ds_path}/abalone.csv")

# Randomly split the dataset into a training (70%) and testing (30%) dataset
all_ds = all_ds.sample(frac=1)
split_idx = len(all_ds) * 7 // 10
train_ds = all_ds.iloc[:split_idx]
test_ds = all_ds.iloc[split_idx:]

# Print the first 5 training examples
train_ds.head(5)

The label column is:

In [ ]:

Copied!

train_ds["Rings"]
train_ds["Rings"]

We can train a regression model:

In [ ]:

Copied!

model = ydf.GradientBoostedTreesLearner(label="Rings",
                                task=ydf.Task.REGRESSION).train(train_ds)
model = ydf.GradientBoostedTreesLearner(label="Rings",
                                task=ydf.Task.REGRESSION).train(train_ds)

Regression models are evaluated using RMSE (root mean square error).

In [ ]:

Copied!

evaluation = model.evaluate(test_ds)

print(evaluation)
evaluation = model.evaluate(test_ds)

print(evaluation)

You can plot a rich evaluation with more plots.

In [ ]:

Copied!

evaluation
evaluation