ResNet50多标签分类基模型仅预测标签



我使用ResNet50作为基本模型来预测图像中的多个标签并总结标签的各自值。

读取数据:

#read the data
data_path = '/content/drive/MyDrive/Notifyer-dataset/dataset'
def load_dataset(folder):
X = []  # create an empty list to store the images
y = []  # create an empty list to store the labels
# get a list of all the files in the folder
filenames = os.listdir(folder)
# iterate over the files
for filename in filenames:
# get the label from the filename
label = filename.split('_')[0]
# open the image file and convert it to a NumPy array
image = Image.open(os.path.join(folder, filename))
image = image.resize((200, 200))  # resize the image to 200x200
image = image.convert('RGB')  # convert the image to RGB
image = np.array(image) / 255  # normalize the pixel values
image = image.reshape(-1, 200, 200, 3)  # reshape to (batch_size, height, width, channels)
# append the image and label to the list
X.append(image)
y.append(label)

# convert the lists to NumPy arrays
X = np.array(X)
y = np.array(y)
#preprocessing
X = X.reshape(-1, 200, 200, 3)  # reshape arrays to 200x200 images with 1 channel
X = X / 255.0  # normalize pixel values
#one hot encoding
num_classes = len(np.unique(y))
y = to_categorical(y, num_classes)

return X, y,num_classes
X, y, num_classes = load_dataset(data_path)

建立模型:

def build_r_cnn_model(num_classes):
"""
Build a region-based CNN model.

Parameters:
num_classes (int): number of classes to classify

Returns:
Model: the R-CNN model
"""
# load the ResNet50 model pre-trained on ImageNet
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(200, 200, 3))

# freeze the base model layers
for layer in base_model.layers:
layer.trainable = False

# add a global average pooling layer
x = base_model.output
x = tf.keras.layers.GlobalAveragePooling2D()(x)
# add a fully-connected layer
x = tf.keras.layers.Dense(1024, activation='relu')(x)
# add a dropout layer
x = tf.keras.layers.Dropout(0.5)(x)
# add a classification layer
predictions = tf.keras.layers.Dense(num_classes, activation='softmax')(x)
#build the model
model = Model(inputs=base_model.input, outputs=predictions)
return model

编译模型:

# build and compile the model
model = build_r_cnn_model(num_classes)
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['accuracy'])

训练模型:

#train
history = model.fit(X_train, y_train, epochs=10, batch_size=128, validation_data=(X_val, y_val))

对图像中所有标签值求和的函数:

#function to calculate total sum of value of predicted labels
def predict_total_sum(model, image):
y_pred = model.predict(image)  # classify the image

# define a lookup table to map class indices to values
value_lookup = {
0: 1, # class 0 corresponds to value 1
1: 2,  # class 1 corresponds to value 2
}

total_sum = 0
for prediction in y_pred:
# get the class index with the highest predicted probability
class_index = np.argmax(prediction)
print(class_index)
# add the value of the detected denomination to the total sum
total_sum += value_lookup[class_index]

return total_sum

对于每个模型编译的每个图像,它给出值1或2,这意味着它只预测一个标签,即使图像具有两个标签的多个对象。

我的数据集很小,其中的每个图像都包含其中一个标签的对象,我是否需要使我的数据集多样化以使模型识别图像中的两个标签,或者模型架构是否有问题?我也试着从头开始建立一个CNN模型,但它给出了相同的结果…

我认为模型的输出。Predict具有shape [1, num_of_classes](您可以通过打印一次它的形状来验证它)。因此,当循环y_pred时,基本上只迭代一次,并将一个类索引添加到total_sum中。即使形状是[num_of_classes],那么我也认为这不是你应该尝试多类分类的方式。希望您能阅读更多关于多类分类是如何完成的。你可以从这个链接获得帮助:https://www.kaggle.com/code/prateek0x/multiclass-image-classification-using-keras

最新更新