为什么我的 nltk 'for' 循环重复结果而不是移动到下一个句子？

让我们想象一下，在删除停止词和旅名化后，我在df2['CleanDescr']中有这5句话：

garcia cash drawer reconciliation report distribution hill specialty
jiang report not delivered oic surgical minute
rosario requesting case log - chadwycke r. smith
villalta clarity report - "solid organ transplant"
wallace need assistance with monthly clarity report

我试着用两种不同的方式为每个句子运行nltk.tag.pos_tag，但它们在第一句话之后不断重复。以下是我做这件事的两种方式：

include_tags=｛'NN'，'VB'，'PRP'，'VBZ'，'vp'，'VPB'，'webd'，'NNS'，'NNPS'｝

def remove_tag(tagset):
for word in df2['CleanDescr']:
tagged_sent = nltk.tag.pos_tag(word.split())
#print(tagged_sent)
edited_sent = ' '.join([words for words,tag in tagged_sent if tag in include_tags])
#print(edited_sent)
return edited_sent
df2['CleanDescr'] = df2['CleanDescr'].apply(remove_tag)
df2['CleanDescr']

def remove_tag(tagset):
for word in df2['CleanDescr']:
tagged_sent = nltk.tag.pos_tag(word.split())
#print(tagged_sent)
for tag in tagged_sent:
if tag in include_tags:
edited_sent = ' '.join()
return edited_sent
df2['CleanDescr'] = df2['CleanDescr'].apply(remove_tag)
df2['CleanDescr']

结果应该贯穿所有5个句子。相反，他们在第一句话之后重复。这是我的结果：

0        garcia cash drawer distribution hill specialty...
1        garcia cash drawer distribution hill specialty...
2        garcia cash drawer distribution hill specialty...
3        garcia cash drawer distribution hill specialty...
4        garcia cash drawer distribution hill specialty...

apply()在每一行上单独运行函数，它在tagset中给你这一行，你应该使用这个tagset，但你在这个函数中使用df2['CleanDescr']运行for-循环，所以你可以使用每一次执行——这毫无意义。

坦率地说，它应该有名称sentence或sent，而不是tagset。

def remove_tag(sent):
tagged_sent = nltk.tag.pos_tag(sent.split())
edited_sent = ' '.join([words for words, tag in tagged_sent if tag in include_tags])
return edited_sent

import pandas as pd
import nltk
df = pd.DataFrame({
'CleanDescr': [    
'garcia cash drawer reconciliation report distribution hill specialty',
'jiang report not delivered oic surgical minute',
'rosario requesting case log - chadwycke r. smith',
'villalta clarity report - "solid organ transplant"',
'wallace need assistance with monthly clarity report',
]
})

include_tags = {'NN', 'VB', 'PRP', 'VBZ', 'VBP', 'VPB', 'VBD', 'NNS', 'NNPS'}
def remove_tag(sent):
tagged_sent = nltk.tag.pos_tag(sent.split())
edited_sent = ' '.join([words for words, tag in tagged_sent if tag in include_tags])
return edited_sent
df['CleanDescr'] = df['CleanDescr'].apply(remove_tag)
print(df['CleanDescr'])

结果：

0    garcia cash drawer reconciliation report distr...
1                                  jiang report minute
2                      rosario case chadwycke r. smith
3                           clarity report transplant"
4                    wallace assistance clarity report
Name: CleanDescr, dtype: object

相关内容

最新更新

热门标签：