删除某个字词，除非它是另一个字词的一部分

我想删除除其他单词的单词部分以外的特定单词这是示例

data1 
name    
here is a this       
company 
there is no food      
data2
words   count
is       56
com     17
no      22

我写了这个函数正在工作，但问题是如果另一个单词的一部分，它会删除一个单词


def drop(y):
for x in data2.words.values:
y['name']= y['name'].str.replace(x, '')
return y

输出

name
here a th       
pany    
there food

我所期望的：

name    
here a this       
company 
there food

为了避免多个空格，您可以按空格拆分值，过滤掉匹配的值，然后重新连接：

s = set(data2['words'])
data1['name'] = [' '.join(y for y in x.split() if not y in s) for x in data1['name']]
print (data1)
name
0  here a this
1      company
2   there food

如果使用bb正则表达式的单词边界，则可以使用replace的解决方案，但会得到多个空格：

pat = '|'.join(r"b{}b".format(x) for x in data2['words'])
data1['name'] = data1['name'].str.replace('('+ pat + ')', '')
print (data1)
name
0  here  a this
1       company
2  there   food

所以最后是必要的删除它们：

pat = '|'.join(r"b{}b".format(x) for x in data2['words'])
data1['name'] = data1['name'].str.replace('('+ pat + ')', '').str.replace(' +', ' ')
print (data1)
name
0  here a this
1      company
2   there food

问题是你没有把你的句子分成单词。因此，单词片段也被替换。这应该有效：

def drop(y):
for x in data2.words.values:
y['name'] = " ".join([entry.replace(x, '') for entry in y['name'].split()])
return y

这是可以解决您的问题的解决方案，您需要在替换值之前分隔句子，否则它会将其视为单个单词并替换值。

data1 = pd.DataFrame(data = {"name":["here is a this company there is no food"]})
data2 = pd.DataFrame(data = {"words": ["is", "com", "no"]})
def drop(data1,data2):
for i in df2["words"]:
data['name'] = " ".join([j.replace(i, '') for j in data1['name'].split()])
return data

相关内容

最新更新

热门标签：