从 Python 3 中的文本中删除两个单词(短语)



我正在尝试从 20K 评论中删除单词。单词存储在数据帧中,大约有 2000 多个。不同数据帧中的注释及其大约 20K。

下面是一个示例:

import pandas as pd
data = pd.Dataframe()   
data = ['Hull Damage happened and its insured by maritime hull insurance company','Non Cash Entry and claims are blocked']  
stopwords = ['Hull insurance', 'Non Cash Entry']

预期输出:

output = ['Hull Damage happened and its insured by maritime company','and claims are blocked']  

您可以使用 pd.series.str.replace 它也接受正则表达式。 请参阅下面的示例。

import pandas as pd
data = ['Hull Damage happened and its insured by maritime hull insurance company','Non Cash Entry and claims are blocked']
replace_data = ['hull insurance', 'Non Cash Entry']    #strings which needs to be replaced
df = pd.DataFrame({'column1': data})    #data added in dataframe as column1
pattern = '|'.join(replace_data)    #created a pattern from replace_data strings
df['column1'] = df['column1'].str.replace(pattern, '')    #replaced the pattern string with null
print(df['column1'])

最新更新