如何从tensorflow预取tf.data.Dataset中获得x和y作为numpy数组?



我需要从预取训练数据集访问我的X特征和Y标签。我知道如果我循环遍历数据集,我可以打印出x和y。例如:

for item in train_dataset:
print(item[0]) #access array with X 
print(item[1]) #access array Y

,但我实际上需要将X与Y分开,以将它们存储在分开的numpy变量中,就像我们在使用sklearn train_testrongplit()函数时对X_train和Y_train所做的那样。作为I,它们将作为另一个函数的参数,该函数不接受预取数据集,只接受numpy array of Xs和numpy array of Ys。有人知道怎么做吗?

您可以在预取数据集上使用tfds.as_numpy,并应用map,list,然后得到numpy.array,如下所示:

from sklearn.model_selection import train_test_split
import tensorflow_datasets as tfds
import tensorflow as tf
import numpy as np
# Generate random data for Dataset
X = np.random.rand(100,3)
y = np.random.randint(0,2, (100))
# Create tf.data.Dataset from random data
train_dataset = tf.data.Dataset.from_tensor_slices((X,y))
train_dataset = train_dataset.prefetch(tf.data.AUTOTUNE)
# Extract numpy.array X & y from tf.data.Dataset
X_numpy = np.asarray(list(map(lambda x: x[0], tfds.as_numpy(train_dataset))))
y_numpy = np.asarray(list(map(lambda x: x[1], tfds.as_numpy(train_dataset))))
print(X_numpy.shape)
# (100, 3)
print(y_numpy.shape)
# (100,)
X_train, X_test, y_train, y_test = train_test_split(X_numpy, y_numpy, 
test_size=0.2, 
random_state=42)
print(X_train.shape)
# (80, 3)
print(X_test.shape)
# (20, 3)
print(y_train.shape)
# (80,)
print(y_test.shape)
# (20,)

相关内容

  • 没有找到相关文章

最新更新