命名实体识别模型总是预测相同的类别，但给出99%的准确率

我想制作Keras NER模型，将标记文本中的亵渎/脏话。我的数据集有超过5万行/句子，但在这5万行中只有2000行包含脏话。我用完整的数据集训练了我的模型，并且只使用包含脏话的行，我得到了相同的结果。损失小于0.1，准确率大于99%，但是，当我想预测的时候，它把所有的单词都标记为相同的(就像那些单词不是脏话一样)。

我已经列举了每一行中的所有单词和标签:

max_len = 50
X = [[word2idx.get(w[0], 0) for w in s] for s in list_of_sentances]
X = pad_sequences(maxlen=max_len, sequences=X, padding="post", value=vocab_len-1)
y = [[label2idx[w[1]] for w in s] for s in list_of_sentances]
y = pad_sequences(maxlen=max_len, sequences=y, padding="post", value=label2idx["O"])
y = [to_categorical(i, num_classes=num_labels) for i in y]

这是我的模型:

input_word = Input(shape=(max_len, ))
model = Embedding(input_dim = vocab_len+1, output_dim = 75, input_length = max_len)(input_word)
model = SpatialDropout1D(0.25)(model)
model = Bidirectional(LSTM(units = 50, return_sequences=True, recurrent_dropout = 0.2))(model)
out = TimeDistributed(Dense(num_labels, activation = "softmax"))(model)
model = Model(input_word, out)
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_3 (InputLayer)         [(None, 50)]              0         
_________________________________________________________________
embedding_2 (Embedding)      (None, 50, 75)            1506000   
_________________________________________________________________
spatial_dropout1d_2 (Spatial (None, 50, 75)            0         
_________________________________________________________________
bidirectional_2 (Bidirection (None, 50, 100)           50400     
_________________________________________________________________
time_distributed_2 (TimeDist (None, 50, 3)             303       
=================================================================
Total params: 1,556,703
Trainable params: 1,556,703
Non-trainable params: 0
opt = Adam(lr = 0.000075)
model.compile(optimizer = opt, loss="categorical_crossentropy", metrics=["accuracy"])
es = EarlyStopping(monitor='val_loss', min_delta=0.0001, patience=2, verbose=0, mode='auto')
history = model.fit(x_train, 
y_train, 
validation_data=(x_test, y_test),
epochs=100, 
batch_size=64,
callbacks = [es], 
verbose=2)
score = model.evaluate(x_test, y_test, batch_size=64)
print("nSCORE:", score)

模型训练结果:

...
...
Epoch 55/100
4846/4846 - 8s - loss: 0.0193 - acc: 0.9940 - val_loss: 0.0307 - val_acc: 0.9933
1212/1212 [==============================] - 0s 254us/sample - loss: 0.0307 - acc: 0.9933

预测(不好意思说脏话):

max_len = 50
list_of_sentances = ["Fucking fuck fuck you asshole bullshit fuck you bitch"]
word_num = list_of_sentances[0].split(" ")
word_num = len(word_num)
test = [[word2idx.get(w[0], 0) for w in s] for s in list_of_sentances]
test = pad_sequences(maxlen=max_len, sequences=test, padding="post", value=vocab_len-1)
pred = model.predict(test)
pred = pred.argmax(axis=-1)[0][:word_num]
labels = {v: k for k, v in label2idx.items()}
prediction = [labels[word] for word in pred]
print(labels)
print(prediction)
{0: 'O', 1: 'profanity'}
['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']

你能告诉我我做错了什么吗?当我想找到组织名称，人名等时，我已经为NER模型尝试了相同的原则……我得到了很好的结果(这是我遵循的教程https://djajafer.medium.com/named-entity-recognition-and-classification-with-keras-4db04e22503d)。

我不能使用class_weights，因为我有序列。我的"类"的例子如下:

No shit .           O profanity O
Ya bitch !          profanity profanity O
Shut the fuck up!   profanity profanity profanity profanity

正如其他成员所说，该指标为您提供99%的准确率，因为它同时考虑了脏话和非脏话:将所有单词标记为非脏话已经会给您一个非常高的准确率，因为数据集是不平衡的。

你可能应该使用fscore指标(精确度/召回率的组合)，主要用于ML/NLP，因为它专注于特定的类。简而言之，它不会考虑真阴性(即正确识别的非脏话)，而会关注真阳性(识别出的脏话)，分别考虑假阳性(非脏话识别为脏话)和假阴性(未识别的脏话)的准确率和召回率。

相关内容

最新更新

热门标签：