Numpy arrays¶
Setup¶
In [ ]:
Copied!
pip install ydf -U
pip install ydf -U
In [1]:
Copied!
import ydf
import numpy as np
import ydf
import numpy as np
In [2]:
Copied!
number_of_examples = 10
dataset = {
"f1": np.random.uniform(size=number_of_examples),
"f2": np.random.uniform(size=number_of_examples),
"l": np.random.randint(0, 2, size=number_of_examples),
}
dataset
number_of_examples = 10
dataset = {
"f1": np.random.uniform(size=number_of_examples),
"f2": np.random.uniform(size=number_of_examples),
"l": np.random.randint(0, 2, size=number_of_examples),
}
dataset
Out[2]:
{'f1': array([0.8408175 , 0.23268677, 0.97215838, 0.06059025, 0.43041995, 0.2838354 , 0.54476241, 0.68916471, 0.15604299, 0.38484593]), 'f2': array([0.53119829, 0.07066887, 0.367039 , 0.88090998, 0.76215773, 0.11381487, 0.84171988, 0.34631154, 0.04948825, 0.56829104]), 'l': array([0, 1, 1, 1, 1, 1, 0, 0, 1, 0])}
Then, let's train a model and generate predictions.
In [3]:
Copied!
model = ydf.RandomForestLearner(label="l").train(dataset)
model = ydf.RandomForestLearner(label="l").train(dataset)
Train model on 10 examples Model trained in 0:00:00.006883
In [4]:
Copied!
model.predict(dataset)
model.predict(dataset)
Out[4]:
array([0.37999973, 0.8599993 , 0.4866663 , 0.6733328 , 0.48999962, 0.836666 , 0.3866664 , 0.48999962, 0.8633326 , 0.5699996 ], dtype=float32)
If your input data is a single numpy array, simply wrap it into a dictionary :).
Training examples can be either one-dimensional or two-dimensional Numpy arrays. If two-dimensional arrays, the second dimension defines different features. This is similar to feeding each dimension separately.
In [5]:
Copied!
number_of_examples = 10
# "f1" is an array of size [num_examples, 3]. YDF sees it as a feature with 20 dimensions.
# "f2" is still a single dimensional feature.
dataset = {
"f1": np.random.uniform(size=(number_of_examples, 3)),
"f2": np.random.uniform(size=number_of_examples),
"l": np.random.randint(0, 2, size=number_of_examples),
}
dataset
number_of_examples = 10
# "f1" is an array of size [num_examples, 3]. YDF sees it as a feature with 20 dimensions.
# "f2" is still a single dimensional feature.
dataset = {
"f1": np.random.uniform(size=(number_of_examples, 3)),
"f2": np.random.uniform(size=number_of_examples),
"l": np.random.randint(0, 2, size=number_of_examples),
}
dataset
Out[5]:
{'f1': array([[0.77831876, 0.44491803, 0.06950368], [0.51402546, 0.35996753, 0.75910236], [0.35404616, 0.30025651, 0.50369477], [0.83403873, 0.61047313, 0.07814819], [0.38385037, 0.40671211, 0.47912743], [0.99550808, 0.93747089, 0.74900908], [0.13106712, 0.48648687, 0.77925262], [0.25118286, 0.34226331, 0.03312203], [0.5772139 , 0.03045939, 0.81802417], [0.27276707, 0.24643098, 0.62696742]]), 'f2': array([0.65184742, 0.14970149, 0.16338311, 0.01975033, 0.43429271, 0.1691804 , 0.14664926, 0.90239627, 0.35412598, 0.31156112]), 'l': array([0, 1, 1, 0, 1, 0, 0, 0, 1, 0])}
In [6]:
Copied!
model = ydf.RandomForestLearner(label="l").train(dataset)
model.predict(dataset)
model = ydf.RandomForestLearner(label="l").train(dataset)
model.predict(dataset)
Train model on 10 examples Model trained in 0:00:00.003045
Out[6]:
array([0.27333316, 0.59999955, 0.5633329 , 0.25333318, 0.46999964, 0.31333312, 0.34999976, 0.38999972, 0.6199995 , 0.47999963], dtype=float32)