如何在keras中定义NER系统的预测函数?



我正在使用Keras创建一个NER系统。在训练和第一次预测之后,我想用它来识别单个字符串或未知数据字符串列表中的网元。

我似乎找不到将这样的字符串或字符串列表传递给model.predict()并获得适当预测的方法。

这是我代码中测试数据的预测,所以我试图调整它以接受看不见的数据字符串并打印令牌+预测:

i = np.random.randint(0, x_test.shape[0])
print("This is sentence:",i)
p = model.predict(np.array([x_test[i]]))
p = np.argmax(p, axis=-1)
print("{:15}{:5}t {}n".format("Word", "True", "Pred"))
print("-" *30)
for w, true, pred in zip(x_test[i], y_test[i], p[0]):
print("{:15}{}t{}".format(words[w-1], tags[true], tags[pred]))

这段代码用NE标签预测和打印每个令牌,但我真的不明白它是如何工作的

下面的代码输出如下:

Word           True      Pred
------------------------------
The            O        O
British        B-gpe    B-gpe
pharmaceutical O        O
company        O        O
GlaxoSmithKlineB-org    O

我想通过例如:

sentence = "President Obama became the first sitting American president to visit Hiroshima"

,并且能够看到已识别的网元。有什么建议吗?

完整代码的副本在这里,数据集在这里。

你可以对一个句子列表做这样的预测:

my_sentences = ["President Obama became the first sitting American president to visit Hiroshima",
"Jack is a good person and living in Iran"]
my_sentences_idx = [[word2idx[w] for w in s.split(" ")] for s in my_sentences]
my_sentences_padded = pad_sequences(maxlen=max_len, sequences=my_sentences_idx, padding="post", value=num_words-1)
preds = np.argmax(model.predict(np.array(my_sentences_padded)), axis=-1)
for idx, p in enumerate(preds):
print("-" *30)
print(my_sentences[idx])
print("-" *30)
for w, pred in zip(my_sentences[idx].split(" "), preds[idx]):
if tags[pred]!="O":
print("{:15} {} ".format(w, tags[pred]))
print()

输出:

------------------------------
President Obama became the first sitting American president to visit Hiroshima
------------------------------
President       B-per 
Obama           I-per 
American        B-gpe 
Hiroshima       B-geo 
------------------------------
Jack is a good person and living in Iran
------------------------------
Jack            B-per 
Iran            B-geo 

相关内容

  • 没有找到相关文章

最新更新