假设我有这样的目录。
full_dataset
|---horse <= 40 images of horse
|---donkey <= 50 images of donkey
|---cow <= 80 images of cow
|---zebra <= <= 30 images of zebra
然后我用tensorflow 写这个
image_generator = ImageDataGenerator(rescale=1./255)
my_dataset = image_generator.flow_from_directory(batch_size=32,
directory='full_dataset',
shuffle=True,
target_size=(280, 280),
class_mode='categorical')
但我想自动拆分该文件,而无需手动将目录更改为训练文件夹和测试文件夹。我不想像这样手动拆分https://www.tensorflow.org/tutorials/images/classification)
我所做的和失败的
(x_train, y_train),(x_test, y_test) = my_dataset.load_data()
您不必使用tensorflow或keras来划分数据集。如果你安装了sklearn软件包,那么你可以简单地使用它:
from sklearn.model_selection import train_test_split
X = ...
Y = ...
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)
您也可以将numpy用于相同的目的:
import numpy
X = ...
Y = ...
test_size = 0.2
train_nsamples = (1-test_size) * len(Y)
x_train, x_test, y_train, y_test = X[:train_nsamples,:], X[train_nsamples:, :], Y[:train_nsamples, ], Y[train_nsamples:,]
在喀拉拉邦:
from keras.datasets import mnist
import numpy as np
from sklearn.model_selection import train_test_split
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x = np.concatenate((x_train, x_test))
y = np.concatenate((y_train, y_test))
train_size = 0.7
x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=train_size)
经过一天的反复尝试和挣扎,我找到了解决方案。
第一路
import glob
horse = glob.glob('full_dataset/horse/*.*')
donkey = glob.glob('full_dataset/donkey/*.*')
cow = glob.glob('full_dataset/cow/*.*')
zebra = glob.glob('full_dataset/zebra/*.*')
data = []
labels = []
for i in horse:
image=tf.keras.preprocessing.image.load_img(i, color_mode='RGB',
target_size= (280,280))
image=np.array(image)
data.append(image)
labels.append(0)
for i in donkey:
image=tf.keras.preprocessing.image.load_img(i, color_mode='RGB',
target_size= (280,280))
image=np.array(image)
data.append(image)
labels.append(1)
for i in cow:
image=tf.keras.preprocessing.image.load_img(i, color_mode='RGB',
target_size= (280,280))
image=np.array(image)
data.append(image)
labels.append(2)
for i in zebra:
image=tf.keras.preprocessing.image.load_img(i, color_mode='RGB',
target_size= (280,280))
image=np.array(image)
data.append(image)
labels.append(3)
data = np.array(data)
labels = np.array(labels)
from sklearn.model_selection import train_test_split
X_train, X_test, ytrain, ytest = train_test_split(data, labels, test_size=0.2,
random_state=42)
第二路
image_generator = ImageDataGenerator(rescale=1/255, validation_split=0.2)
train_dataset = image_generator.flow_from_directory(batch_size=32,
directory='full_dataset',
shuffle=True,
target_size=(280, 280),
subset="training",
class_mode='categorical')
validation_dataset = image_generator.flow_from_directory(batch_size=32,
directory='full_dataset',
shuffle=True,
target_size=(280, 280),
subset="validation",
class_mode='categorical')
第二种方式的主要缺点是,你不能用来显示图片。如果您写入validation_dataset[1]
,它将出错。但如果我使用第一种方式:X_test[1]
,它就起作用了