删除某个字词,除非它是另一个字词的一部分



我想删除除其他单词的单词部分以外的特定单词 这是示例

data1 
name    
here is a this       
company 
there is no food      
data2
words   count
is       56
com     17
no      22

我写了这个函数正在工作,但问题是如果另一个单词的一部分,它会删除一个单词


def drop(y):
for x in data2.words.values:
y['name']= y['name'].str.replace(x, '')
return y

输出

name
here a th       
pany    
there food 

我所期望的:

name    
here a this       
company 
there food   

为了避免多个空格,您可以按空格拆分值,过滤掉匹配的值,然后重新连接:

s = set(data2['words'])
data1['name'] = [' '.join(y for y in x.split() if not y in s) for x in data1['name']]
print (data1)
name
0  here a this
1      company
2   there food

如果使用bb正则表达式的单词边界,则可以使用replace的解决方案,但会得到多个空格:

pat = '|'.join(r"b{}b".format(x) for x in data2['words'])
data1['name'] = data1['name'].str.replace('('+ pat + ')', '')
print (data1)
name
0  here  a this
1       company
2  there   food

所以最后是必要的删除它们:

pat = '|'.join(r"b{}b".format(x) for x in data2['words'])
data1['name'] = data1['name'].str.replace('('+ pat + ')', '').str.replace(' +', ' ')
print (data1)
name
0  here a this
1      company
2   there food

问题是你没有把你的句子分成单词。因此,单词片段也被替换。这应该有效:

def drop(y):
for x in data2.words.values:
y['name'] = " ".join([entry.replace(x, '') for entry in y['name'].split()])
return y

这是可以解决您的问题的解决方案,您需要在替换值之前分隔句子,否则它会将其视为单个单词并替换值。

data1 = pd.DataFrame(data = {"name":["here is a this company there is no food"]})
data2 = pd.DataFrame(data = {"words": ["is", "com", "no"]})
def drop(data1,data2):
for i in df2["words"]:
data['name'] = " ".join([j.replace(i, '') for j in data1['name'].split()])
return data

最新更新