我正试图从这里访问EMNIST数据:
https://www.tensorflow.org/datasets/splits
的代码:
train_ds, test_ds = tfds.load('emnist', split=['train', 'test'], shuffle_files=True)
I tried doing this:
x_train = train_ds['image']
y_train = train_ds['label']
x_test = test_ds['image']
y_test = test_ds['label']
但是我得到了错误TypeError: 'PrefetchDataset' object is not subscriptable
当我尝试打印train_ds
时,它打印
<PrefetchDataset element_spec={'image': TensorSpec(shape=(28, 28, 1), dtype=tf.uint8, name=None), 'label': TensorSpec(shape=(), dtype=tf.int64, name=None)}>
我想把图像和标签分开到x_train, y_train, x_test, y_test
,就像你从keras中提取mnist
数据一样。
我从这里看到:https://www.tensorflow.org/datasets/catalog/emnist这个特性的结构是
FeaturesDict({
'image': Image(shape=(28, 28, 1), dtype=uint8),
'label': ClassLabel(shape=(), dtype=int64, num_classes=47),
})
但是我不确定如何提取它:C
如果你只是想分割你的数据集,但保持它们为tf.data.Datasets
,你可以运行(推荐):
import tensorflow as tf
import tensorflow_datasets as tfds
train_ds, test_ds = tfds.load('emnist', split=['train', 'test'], shuffle_files=True)
x_train = train_ds.map(lambda i: i['image'])
y_train = train_ds.map(lambda l: l['label'])
x_test = test_ds.map(lambda x: x['image'])
y_test = test_ds.map(lambda y: y['label'])
您也可以将数据集转换为numpy
数组,但这可能需要一段时间(Colab上约6分钟):
import numpy as np
x_train = np.array(list(train_ds.map(lambda i: i['image'])))
y_train = np.array(list(train_ds.map(lambda l: l['label'])))
x_test = np.array(list(test_ds.map(lambda x: x['image'])))
y_test = np.array(list(test_ds.map(lambda y: y['label'])))
你会在这里找到一个很好的例子。
import numpy as np
from sklearn.model_selection import train_test_split
X, y = np.arange(10).reshape((5, 2)), range(5)
X
list(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
X_train
y_train
X_test
y_test
train_test_split(y, shuffle=False)
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_testrongplit.html
https://realpython.com/train-test-split-python-data/