我正在尝试从 20K 评论中删除单词。单词存储在数据帧中,大约有 2000 多个。不同数据帧中的注释及其大约 20K。
下面是一个示例:
import pandas as pd
data = pd.Dataframe()
data = ['Hull Damage happened and its insured by maritime hull insurance company','Non Cash Entry and claims are blocked']
stopwords = ['Hull insurance', 'Non Cash Entry']
预期输出:
output = ['Hull Damage happened and its insured by maritime company','and claims are blocked']
您可以使用 pd.series.str.replace 它也接受正则表达式。 请参阅下面的示例。
import pandas as pd
data = ['Hull Damage happened and its insured by maritime hull insurance company','Non Cash Entry and claims are blocked']
replace_data = ['hull insurance', 'Non Cash Entry'] #strings which needs to be replaced
df = pd.DataFrame({'column1': data}) #data added in dataframe as column1
pattern = '|'.join(replace_data) #created a pattern from replace_data strings
df['column1'] = df['column1'].str.replace(pattern, '') #replaced the pattern string with null
print(df['column1'])