使用 'for' 循环和'if'语句加速 Python 词形还原器



我使用此代码根据单词的邮费应用Lemmatizer。

def lemmatize_all(sentence):
wnl = WordNetLemmatizer()
lem = []
for word, tag in pos_tag(word_tokenize(sentence)):
if tag.startswith("NN"):
lem.append(wnl.lemmatize(word, pos='n'))
elif tag.startswith('VB'):
lem.append(wnl.lemmatize(word, pos='v'))
elif tag.startswith('JJ'):
lem.append(wnl.lemmatize(word, pos='a'))
else:
lem.append(word)
return lem

问题是我掌握的数据越多,所需时间就越长。你能帮我加速代码吗。

我不确定这是否适合您,但它确实可以复制代码的行为,并且可以轻松扩展。

def lemmatize_all(sentence):
wnl = WordNetLemmatizer()
lem = []
tags = {
'NN': 'n',
'VB': 'v',
'JJ': 'a',
}
for word, tag in pos_tag(word_tokenize(sentence)):
tag_start = tag[:2]
if tag_start in tags:
lem.append(wnl.lemmatize(word, pos=tags[tag_start]))
else:
lem.append(word)
return lem

通过这种方式,您可以创建一个将标签翻译为位置的字典。或者,如果标签比姿势多,也许这会派上用场:

def lemmatize_all(sentence):
wnl = WordNetLemmatizer()
lem = []
tags = {
'n': ['NN','NA'],
'v': ['VB','VA'],
'a': ['JJ','JA'],
}
for word, tag in pos_tag(word_tokenize(sentence)):
tag_start = tag[:2]
if tag_start in  tags['n']:
lem.append(wnl.lemmatize(word, pos='n'))
elif tag_start in  tags['v']:
lem.append(wnl.lemmatize(word, pos='v'))
elif tag_start in  tags['a']:
lem.append(wnl.lemmatize(word, pos='a'))
else:
lem.append(word)
return lem

我添加了以NA、VA和JA开头的标签,以说明如何扩展代码。

最新更新