我想对BreakHis数据集的乳腺癌症组织病理学图像进行二元分类(https://www.kaggle.com/ambarish/breakhis)使用迁移学习和Inception Resnet v2。目标是冻结所有层,并通过在模型中添加两个神经元来训练完全连接的层。特别是,最初我想考虑与放大倍数40X相关的图像(良性:625,恶性:1370(。以下是我所做工作的总结:
- 我读取了图像并将其调整为150x150
- 我将数据集划分为训练集、验证集和测试集
- 我加载预先训练的网络Inception Resnet v2
- 我冻结了所有的层,我添加了两个神经元作为二进制分类(1="良性",0="恶性"(
- 我使用Adam方法作为激活函数编译模型
- 我进行培训
- 我做了预测
- 我计算准确度
这是代码:
data = dataset[dataset["Magnificant"]=="40X"]
def preprocessing(dataset, img_size):
# images
X = []
# labels
y = []
i = 0
for image in list(dataset["Path"]):
# Ridimensiono e leggo le immagini
X.append(cv2.resize(cv2.imread(image, cv2.IMREAD_COLOR),
(img_size, img_size), interpolation=cv2.INTER_CUBIC))
basename = os.path.basename(image)
# Get labels
if dataset.loc[i][2] == "benign":
y.append(1)
else:
y.append(0)
i = i+1
return X, y
X, y = preprocessing(data, 150)
X = np.array(X)
y = np.array(y)
# Splitting
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify = y_40, shuffle=True, random_state=1)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=1)
conv_base = InceptionResNetV2(weights='imagenet', include_top=False, input_shape=[150, 150, 3])
# Freezing
for layer in conv_base.layers:
layer.trainable = False
model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(1, activation='sigmoid'))
opt = tf.keras.optimizers.Adam(learning_rate=0.0002)
loss = tf.keras.losses.BinaryCrossentropy(from_logits=True)
model.compile(loss=loss, optimizer=opt, metrics = ["accuracy", tf.metrics.AUC()])
batch_size = 32
train_datagen = ImageDataGenerator(rescale=1./255)
val_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow(X_train, y_train, batch_size=batch_size)
val_generator = val_datagen.flow(X_val, y_val, batch_size=batch_size)
ntrain =len(X_train)
nval = len(X_val)
len(y_train)
epochs = 70
history = model.fit_generator(train_generator,
steps_per_epoch=ntrain // batch_size,
epochs=epochs,
validation_data=val_generator,
validation_steps=nval // batch_size)
这是最后一个时期的训练输出:
Epoch 70/70
32/32 [==============================] - 3s 84ms/step - loss: 0.0499 - accuracy: 0.9903 - auc_5: 0.9996 - val_loss: 0.5661 - val_accuracy: 0.8250 - val_auc_5: 0.8521
我做了预测:
test_datagen = ImageDataGenerator(rescale=1./255)
x = X_test
y_pred = model.predict(test_datagen.flow(x))
y_p = []
for i in range(len(y_pred)):
if y_pred[i] > 0.5:
y_p.append(1)
else:
y_p.append(0)
我计算准确度:
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_p)
print(accuracy)
这是我得到的准确度值:0.5459098497495827
为什么我的准确率如此之低,我做了几次测试,但总是得到相似的结果?
更新
我做了以下更改,但我总是得到相同的结果(只放置代码的修改部分(:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify = y, shuffle=True, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, stratify = y_train, shuffle=True, random_state=1)
model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1, activation='sigmoid'))
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=3)
ntrain =len(X_train)
nval = len(X_val)
len(y_train)
epochs = 70
history = model.fit_generator(train_generator,
steps_per_epoch=ntrain // batch_size,
epochs=epochs,
validation_data=val_generator,
validation_steps=nval // batch_size, callbacks=[callback])
更新2
我还将_logits从True更改为False,但当然这还不是问题所在。我总是能得到57%的准确率。
这是30个时期的模型.fit输出:
Epoch 1/30
32/32 [==============================] - 23s 202ms/step - loss: 0.7994 - accuracy: 0.6010 - auc: 0.5272 - val_loss: 0.5338 - val_accuracy: 0.7688 - val_auc: 0.7943
Epoch 2/30
32/32 [==============================] - 3s 87ms/step - loss: 0.5778 - accuracy: 0.7206 - auc: 0.7521 - val_loss: 0.4763 - val_accuracy: 0.7781 - val_auc: 0.8155
Epoch 3/30
32/32 [==============================] - 3s 85ms/step - loss: 0.5311 - accuracy: 0.7581 - auc: 0.7710 - val_loss: 0.4740 - val_accuracy: 0.7719 - val_auc: 0.8212
Epoch 4/30
32/32 [==============================] - 3s 85ms/step - loss: 0.4684 - accuracy: 0.7718 - auc: 0.8219 - val_loss: 0.4270 - val_accuracy: 0.8031 - val_auc: 0.8611
Epoch 5/30
32/32 [==============================] - 3s 83ms/step - loss: 0.4280 - accuracy: 0.7943 - auc: 0.8617 - val_loss: 0.4496 - val_accuracy: 0.7969 - val_auc: 0.8468
Epoch 6/30
32/32 [==============================] - 3s 88ms/step - loss: 0.4237 - accuracy: 0.8250 - auc: 0.8673 - val_loss: 0.3993 - val_accuracy: 0.7937 - val_auc: 0.8840
Epoch 7/30
32/32 [==============================] - 3s 85ms/step - loss: 0.4130 - accuracy: 0.8513 - auc: 0.8767 - val_loss: 0.4207 - val_accuracy: 0.7781 - val_auc: 0.8692
Epoch 8/30
32/32 [==============================] - 3s 85ms/step - loss: 0.3446 - accuracy: 0.8485 - auc: 0.9077 - val_loss: 0.4229 - val_accuracy: 0.7937 - val_auc: 0.8730
Epoch 9/30
32/32 [==============================] - 3s 85ms/step - loss: 0.3690 - accuracy: 0.8514 - auc: 0.9003 - val_loss: 0.4300 - val_accuracy: 0.8062 - val_auc: 0.8696
Epoch 10/30
32/32 [==============================] - 3s 100ms/step - loss: 0.3204 - accuracy: 0.8533 - auc: 0.9270 - val_loss: 0.4235 - val_accuracy: 0.7969 - val_auc: 0.8731
Epoch 11/30
32/32 [==============================] - 3s 86ms/step - loss: 0.3555 - accuracy: 0.8508 - auc: 0.9124 - val_loss: 0.4124 - val_accuracy: 0.8000 - val_auc: 0.8797
Epoch 12/30
32/32 [==============================] - 3s 85ms/step - loss: 0.3243 - accuracy: 0.8481 - auc: 0.9308 - val_loss: 0.3979 - val_accuracy: 0.7969 - val_auc: 0.8908
Epoch 13/30
32/32 [==============================] - 3s 85ms/step - loss: 0.3017 - accuracy: 0.8744 - auc: 0.9348 - val_loss: 0.4239 - val_accuracy: 0.8094 - val_auc: 0.8758
Epoch 14/30
32/32 [==============================] - 3s 89ms/step - loss: 0.3317 - accuracy: 0.8521 - auc: 0.9221 - val_loss: 0.4238 - val_accuracy: 0.8094 - val_auc: 0.8704
Epoch 15/30
32/32 [==============================] - 3s 85ms/step - loss: 0.2840 - accuracy: 0.8908 - auc: 0.9490 - val_loss: 0.4131 - val_accuracy: 0.8281 - val_auc: 0.8858
Epoch 16/30
32/32 [==============================] - 3s 85ms/step - loss: 0.2583 - accuracy: 0.8905 - auc: 0.9511 - val_loss: 0.3841 - val_accuracy: 0.8375 - val_auc: 0.9007
Epoch 17/30
32/32 [==============================] - 3s 87ms/step - loss: 0.2810 - accuracy: 0.8648 - auc: 0.9470 - val_loss: 0.3928 - val_accuracy: 0.8438 - val_auc: 0.8972
Epoch 18/30
32/32 [==============================] - 3s 89ms/step - loss: 0.2622 - accuracy: 0.8923 - auc: 0.9550 - val_loss: 0.3732 - val_accuracy: 0.8438 - val_auc: 0.9089
Epoch 19/30
32/32 [==============================] - 3s 84ms/step - loss: 0.2486 - accuracy: 0.8990 - auc: 0.9579 - val_loss: 0.4077 - val_accuracy: 0.8250 - val_auc: 0.8924
Epoch 20/30
32/32 [==============================] - 3s 85ms/step - loss: 0.2412 - accuracy: 0.9074 - auc: 0.9635 - val_loss: 0.4249 - val_accuracy: 0.8219 - val_auc: 0.8787
Epoch 21/30
32/32 [==============================] - 3s 84ms/step - loss: 0.2386 - accuracy: 0.9095 - auc: 0.9657 - val_loss: 0.4177 - val_accuracy: 0.8094 - val_auc: 0.8904
Epoch 22/30
32/32 [==============================] - 3s 99ms/step - loss: 0.2313 - accuracy: 0.8996 - auc: 0.9668 - val_loss: 0.4089 - val_accuracy: 0.8406 - val_auc: 0.8890
Epoch 23/30
32/32 [==============================] - 3s 86ms/step - loss: 0.2424 - accuracy: 0.9067 - auc: 0.9654 - val_loss: 0.4033 - val_accuracy: 0.8500 - val_auc: 0.8953
Epoch 24/30
32/32 [==============================] - 3s 85ms/step - loss: 0.2315 - accuracy: 0.9045 - auc: 0.9626 - val_loss: 0.3903 - val_accuracy: 0.8250 - val_auc: 0.9030
Epoch 25/30
32/32 [==============================] - 3s 85ms/step - loss: 0.2001 - accuracy: 0.9321 - auc: 0.9788 - val_loss: 0.4276 - val_accuracy: 0.8000 - val_auc: 0.8855
Epoch 26/30
32/32 [==============================] - 3s 87ms/step - loss: 0.2118 - accuracy: 0.9212 - auc: 0.9695 - val_loss: 0.4335 - val_accuracy: 0.8125 - val_auc: 0.8897
Epoch 27/30
32/32 [==============================] - 3s 85ms/step - loss: 0.2463 - accuracy: 0.8941 - auc: 0.9665 - val_loss: 0.4112 - val_accuracy: 0.8438 - val_auc: 0.8882
Epoch 28/30
32/32 [==============================] - 3s 85ms/step - loss: 0.2130 - accuracy: 0.9033 - auc: 0.9771 - val_loss: 0.3834 - val_accuracy: 0.8406 - val_auc: 0.9021
Epoch 29/30
32/32 [==============================] - 3s 86ms/step - loss: 0.2021 - accuracy: 0.9229 - auc: 0.9754 - val_loss: 0.3855 - val_accuracy: 0.8469 - val_auc: 0.9008
Epoch 30/30
32/32 [==============================] - 3s 88ms/step - loss: 0.1859 - accuracy: 0.9314 - auc: 0.9824 - val_loss: 0.4018 - val_accuracy: 0.8375 - val_auc: 0.8928
您必须在损失函数中将from_logits=True
更改为from_logits=False
。再次致谢-@Frightera。
您的模型似乎有些地方太合适了。最好你能查一下。
- 进行10次K-Fold测试。它将显示真实的结果
- 在你的指标中,一定要加上F1的分数。F1值将使您真正了解FP和FN方面的TP指标
- 添加一些增强(除了重新缩放之外(,使模型对数据集的变化具有鲁棒性
- 调整训练参数(如果你觉得(
如果这些更改失败,那么模型可能无法学习图像的伪影。你应该换一个型号!
一旦你的验证准确性停止进步,这意味着你的base_model的顶层(来自迁移学习的顶层(应该被解冻(这样他们就可以专门化你的数据集(,然后以低学习率再次开始训练(这样他们不会受到太大损害(,然后你就会看到改进。