Numpy arrays¶

Setup¶

In [ ]:

Copied!

pip install ydf -U
pip install ydf -U

In [1]:

Copied!

import ydf
import numpy as np
import ydf
import numpy as np

Numpy¶

Numpy arrays are great to train and use YDF models. YDF doesn't directly accept Numpy arrays, but instead dictionaries of Numpy arrays. Using a dictionary is great to your features.

Let's define a dataset:

In [2]:

Copied!





number_of_examples = 10
dataset = {
    "f1": np.random.uniform(size=number_of_examples),
    "f2": np.random.uniform(size=number_of_examples),
    "l": np.random.randint(0, 2, size=number_of_examples),
}

dataset
number_of_examples = 10
dataset = {
    "f1": np.random.uniform(size=number_of_examples),
    "f2": np.random.uniform(size=number_of_examples),
    "l": np.random.randint(0, 2, size=number_of_examples),
}

dataset

Out[2]:

{'f1': array([0.8408175 , 0.23268677, 0.97215838, 0.06059025, 0.43041995,
        0.2838354 , 0.54476241, 0.68916471, 0.15604299, 0.38484593]),
 'f2': array([0.53119829, 0.07066887, 0.367039  , 0.88090998, 0.76215773,
        0.11381487, 0.84171988, 0.34631154, 0.04948825, 0.56829104]),
 'l': array([0, 1, 1, 1, 1, 1, 0, 0, 1, 0])}

Then, let's train a model and generate predictions.

In [3]:

Copied!

model = ydf.RandomForestLearner(label="l").train(dataset)
model = ydf.RandomForestLearner(label="l").train(dataset)

Train model on 10 examples
Model trained in 0:00:00.006883

In [4]:

Copied!

model.predict(dataset)
model.predict(dataset)

Out[4]:

array([0.37999973, 0.8599993 , 0.4866663 , 0.6733328 , 0.48999962,
       0.836666  , 0.3866664 , 0.48999962, 0.8633326 , 0.5699996 ],
      dtype=float32)

If your input data is a single numpy array, simply wrap it into a dictionary :).

Training examples can be either one-dimensional or two-dimensional Numpy arrays. If two-dimensional arrays, the second dimension defines different features. This is similar to feeding each dimension separately.

In [5]:

Copied!





number_of_examples = 10

# "f1" is an array of size [num_examples, 3]. YDF sees it as a feature with 20 dimensions.
# "f2" is still a single dimensional feature.
dataset = {
    "f1": np.random.uniform(size=(number_of_examples, 3)),
    "f2": np.random.uniform(size=number_of_examples),
    "l": np.random.randint(0, 2, size=number_of_examples),
}
dataset
number_of_examples = 10

# "f1" is an array of size [num_examples, 3]. YDF sees it as a feature with 20 dimensions.
# "f2" is still a single dimensional feature.
dataset = {
    "f1": np.random.uniform(size=(number_of_examples, 3)),
    "f2": np.random.uniform(size=number_of_examples),
    "l": np.random.randint(0, 2, size=number_of_examples),
}
dataset

Out[5]:

{'f1': array([[0.77831876, 0.44491803, 0.06950368],
        [0.51402546, 0.35996753, 0.75910236],
        [0.35404616, 0.30025651, 0.50369477],
        [0.83403873, 0.61047313, 0.07814819],
        [0.38385037, 0.40671211, 0.47912743],
        [0.99550808, 0.93747089, 0.74900908],
        [0.13106712, 0.48648687, 0.77925262],
        [0.25118286, 0.34226331, 0.03312203],
        [0.5772139 , 0.03045939, 0.81802417],
        [0.27276707, 0.24643098, 0.62696742]]),
 'f2': array([0.65184742, 0.14970149, 0.16338311, 0.01975033, 0.43429271,
        0.1691804 , 0.14664926, 0.90239627, 0.35412598, 0.31156112]),
 'l': array([0, 1, 1, 0, 1, 0, 0, 0, 1, 0])}

In [6]:

Copied!

model = ydf.RandomForestLearner(label="l").train(dataset)
model.predict(dataset)
model = ydf.RandomForestLearner(label="l").train(dataset)
model.predict(dataset)

Train model on 10 examples
Model trained in 0:00:00.003045

Out[6]:

array([0.27333316, 0.59999955, 0.5633329 , 0.25333318, 0.46999964,
       0.31333312, 0.34999976, 0.38999972, 0.6199995 , 0.47999963],
      dtype=float32)