# Regression¶

## Setup¶

In [ ]:

Copied!

```
pip install ydf -U
```

pip install ydf -U

## What is regression?¶

**Regression,** is the task of predicting a numerical value, such as a tally, a measure, or a quantity. For instance, predicting the age of an animal or the cost of a product are regression problems. By default, the output of a regression model is the expected value, that is, the value that minimizes the squared error.
Regression labels can be integers or float values.

## Training a regression model¶

The task of a model (e.g., classification, regression, ranking, uplifting) is determined by the learner argument `task`

.

In [1]:

Copied!

```
# Load libraries
import ydf # Yggdrasil Decision Forests
import pandas as pd # We use Pandas to load small datasets
# Download a classification dataset and load it as a Pandas DataFrame.
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset"
all_ds = pd.read_csv(f"{ds_path}/abalone.csv")
# Randomly split the dataset into a training (70%) and testing (30%) dataset
all_ds = all_ds.sample(frac=1)
split_idx = len(all_ds) * 7 // 10
train_ds = all_ds.iloc[:split_idx]
test_ds = all_ds.iloc[split_idx:]
# Print the first 5 training examples
train_ds.head(5)
```

# Load libraries
import ydf # Yggdrasil Decision Forests
import pandas as pd # We use Pandas to load small datasets
# Download a classification dataset and load it as a Pandas DataFrame.
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset"
all_ds = pd.read_csv(f"{ds_path}/abalone.csv")
# Randomly split the dataset into a training (70%) and testing (30%) dataset
all_ds = all_ds.sample(frac=1)
split_idx = len(all_ds) * 7 // 10
train_ds = all_ds.iloc[:split_idx]
test_ds = all_ds.iloc[split_idx:]
# Print the first 5 training examples
train_ds.head(5)

Out[1]:

Type | LongestShell | Diameter | Height | WholeWeight | ShuckedWeight | VisceraWeight | ShellWeight | Rings | |
---|---|---|---|---|---|---|---|---|---|

3191 | M | 0.650 | 0.515 | 0.180 | 1.3315 | 0.5665 | 0.3470 | 0.405 | 13 |

1752 | M | 0.710 | 0.560 | 0.220 | 2.0150 | 0.9215 | 0.4540 | 0.566 | 11 |

2238 | I | 0.460 | 0.335 | 0.110 | 0.4440 | 0.2250 | 0.0745 | 0.110 | 8 |

3 | M | 0.440 | 0.365 | 0.125 | 0.5160 | 0.2155 | 0.1140 | 0.155 | 10 |

2685 | M | 0.625 | 0.480 | 0.145 | 1.0850 | 0.4645 | 0.2445 | 0.327 | 10 |

The label column is:

In [2]:

Copied!

```
train_ds["Rings"]
```

train_ds["Rings"]

Out[2]:

3191 13 1752 11 2238 8 3 10 2685 10 .. 1845 8 603 11 3264 12 1268 11 2104 11 Name: Rings, Length: 2923, dtype: int64

We can train a regression model:

In [3]:

Copied!

```
model = ydf.GradientBoostedTreesLearner(label="Rings",
task=ydf.Task.REGRESSION).train(train_ds)
```

model = ydf.GradientBoostedTreesLearner(label="Rings",
task=ydf.Task.REGRESSION).train(train_ds)

Train model on 2923 examples Model trained in 0:00:00.590275

Regression models are evaluated using RMSE (root mean square error).

In [4]:

Copied!

```
evaluation = model.evaluate(test_ds)
print(evaluation)
```

evaluation = model.evaluate(test_ds)
print(evaluation)

RMSE: 2.13062 num examples: 1254 num examples (weighted): 1254

You can plot a rich evaluation with more plots.

In [5]:

Copied!

```
evaluation
```

evaluation

Out[5]: