我有一个图像数据集和两个文本文件,其中每行包含三张图片的id,第一张用于训练,并告诉我第一张图片与第二张最相似,而不是第三张。第二个是用于测试的:我必须预测每一行的第一张图像与第一张图像还是第二张图像最相似。为此,我训练了一个连体网络,利用三重损失作为本文的指导方针:https://keras.io/examples/vision/siamese_network/
训练完网络后,我不知道如何继续评估我的测试数据集,准备我已经完成的数据:
with open('test_triplets.txt') as f:
lines2 = f.readlines()
lines2 = [line.split('n', 1)[0] for line in lines2]
anchor2 = [line.split()[0] for line in lines2]
pic1 = [line.split()[1] for line in lines2]
pic2 = [line.split()[2] for line in lines2]
anchor2 = ['food/' + item + '.jpg' for item in anchor2]
pic1 = ['food/' + item + '.jpg' for item in pic1]
pic2 = ['food/' + item + '.jpg' for item in pic2]
anchor2_dataset = tf.data.Dataset.from_tensor_slices(anchor2)
pic1_dataset = tf.data.Dataset.from_tensor_slices(pic1)
pic2_dataset = tf.data.Dataset.from_tensor_slices(pic2)
test_dataset = tf.data.Dataset.zip((anchor2_dataset, pic1_dataset, pic2_dataset))
test_dataset = test_dataset.map(preprocess_triplets)
test_dataset = test_dataset.batch(32, drop_remainder=False)
test_dataset = test_dataset.prefetch(8)
然后我试着像下面这样使用for循环,但是运行时间太高了,因为我在txt文件中有大约50000行。
n_images = len(anchor2)
results = np.zeros((n_images,2))
for i in range(n_images):
sample = next(iter(test_dataset))
anchor, positive, negative = sample
anchor_embedding, positive_embedding, negative_embedding = (
embedding(resnet.preprocess_input(anchor)),
embedding(resnet.preprocess_input(positive)),
embedding(resnet.preprocess_input(negative)),
)
cosine_similarity = metrics.CosineSimilarity()
positive_similarity = cosine_similarity(anchor_embedding, positive_embedding)
results[i,0] = positive_similarity.numpy()
negative_similarity = cosine_similarity(anchor_embedding, negative_embedding)
results[i,1] = negative_similarity.numpy()
我怎样才能对我的测试三胞胎进行预测?我的目标是有一个向量[n_testing_triplets x 1],其中如果第一张图片与锚最相似,每行为1,否则为0。
你可以先堆叠你的图片,则并行计算所有嵌入像这样:
import numpy as np
stack = np.stack([anchor0, positive0, negative0, ..., anchor999, positive999, negative999])
# then you calculate all embeding at the same time like this
embeddings = list(embedding(resnet.preprocess_input(stack)).numpy())
然后在循环中比较你想要的嵌入:
cosine_similarity = metrics.CosineSimilarity()
positive_similarity = cosine_similarity(embeddings [0] , embeddings [1])
whatever_storage = positive_similarity.numpy()
negative_similarity = cosine_similarity(embeddings [0] , embeddings [2])
whatever_storage = negative_similarity.numpy()