我想删除除其他单词的单词部分以外的特定单词 这是示例
data1
name
here is a this
company
there is no food
data2
words count
is 56
com 17
no 22
我写了这个函数正在工作,但问题是如果另一个单词的一部分,它会删除一个单词
def drop(y):
for x in data2.words.values:
y['name']= y['name'].str.replace(x, '')
return y
输出
name
here a th
pany
there food
我所期望的:
name
here a this
company
there food
为了避免多个空格,您可以按空格拆分值,过滤掉匹配的值,然后重新连接:
s = set(data2['words'])
data1['name'] = [' '.join(y for y in x.split() if not y in s) for x in data1['name']]
print (data1)
name
0 here a this
1 company
2 there food
如果使用bb
正则表达式的单词边界,则可以使用replace
的解决方案,但会得到多个空格:
pat = '|'.join(r"b{}b".format(x) for x in data2['words'])
data1['name'] = data1['name'].str.replace('('+ pat + ')', '')
print (data1)
name
0 here a this
1 company
2 there food
所以最后是必要的删除它们:
pat = '|'.join(r"b{}b".format(x) for x in data2['words'])
data1['name'] = data1['name'].str.replace('('+ pat + ')', '').str.replace(' +', ' ')
print (data1)
name
0 here a this
1 company
2 there food
问题是你没有把你的句子分成单词。因此,单词片段也被替换。这应该有效:
def drop(y):
for x in data2.words.values:
y['name'] = " ".join([entry.replace(x, '') for entry in y['name'].split()])
return y
这是可以解决您的问题的解决方案,您需要在替换值之前分隔句子,否则它会将其视为单个单词并替换值。
data1 = pd.DataFrame(data = {"name":["here is a this company there is no food"]})
data2 = pd.DataFrame(data = {"words": ["is", "com", "no"]})
def drop(data1,data2):
for i in df2["words"]:
data['name'] = " ".join([j.replace(i, '') for j in data1['name'].split()])
return data