使用python从lemmatize单词集中删除符号的任何方法



我从下面的代码中得到了一个lemmatize输出,输出字由":!,((";符号

output_H3 = [lemmatizer.lemmatize(w.lower(), pos=wordnet.VERB) for w in processed_H3_tag]

输出:-

  • ['hide((','show((,'方法:','jquery','slide','元素:],"今天启动w3schools",">!">']

预期输出:-

  • ['hide','显示],'方法','jquery','slide','元素,'launchedw3schools','今天']

您也可以使用translate()string.punctuation(!"#$%&'()*+,-./:;<=>?@[]^_``{|}~(:

trans = str.maketrans('', '', string.punctuation)   
output_wo_punc = [s.translate(trans) for s in output]

哪个返回:

> ['hide', 'show', 'methods', 'jquery', 'slide', 'elements', 'launchedw3schools', 'today']

正则表达式可以帮助:

import re 
output = [
"hide()",
"show()",
"methods:",
"jquery",
"slide",
"elements:",
"launchedw3schools",
"today!",
]

>>> import pprint
>>> expected = [re.sub(r'[:,?!()]', '', e) for e in output]
>>> pprint.pprint(expected)
['hide',
'show',
'methods',
'jquery',
'slide',
'elements',
'launchedw3schools',
'today']

这将不使用任何内容替换不需要的字符列表中的任何内容。