Pandas为数据集中字符串的每一个匹配项创建新行



我有两个数据集(数据帧(,其中一个包含文本,另一个包含我正在搜索的单词,我想知道它们是否包含在其中一些文本中并标记它们。

我想做的方法是在数据帧2中为每个与数据帧1中包含的值匹配的字添加一个新行,

一个例子:

数据帧1

word        id
'sushi'     1
'pizza'     2
'burger'    3
'plaza'     4
'park'      5
'mountain'  6

要搜索的数据帧2:

注:数据帧2有更多列,但它们与解决问题无关

text
'I eat pizza in the park'  
'I eat sushi' 
'She eats sushi with pizza in the plaza'
'He eats'

所需输出为以下

text                                      contained_word_id
'I eat pizza in the park'                 2
'I eat pizza in the park'                 5
'I eat sushi'                             1
'She eats sushi with pizza in the plaza'  1
'She eats sushi with pizza in the plaza'  2
'She eats sushi with pizza in the plaza'  4
'He eats'                                 NaN

我们可以先进行findall,然后进行explodemap

df2['word'] = df2.text.str.findall('|'.join(df1.word.tolist()))
df2 = df2.explode('word')
df2['id'] = df2.word.map(df1.set_index('word')['id'])
df2
Out[443]: 
text   word   id
0               'I eat pizza in the park'    pizza  2.0
0               'I eat pizza in the park'     park  5.0
1                            'I eat sushi'   sushi  1.0
2  'She eats sushi with pizza in the plaza'  sushi  1.0
2  'She eats sushi with pizza in the plaza'  pizza  2.0
2  'She eats sushi with pizza in the plaza'  plaza  4.0
3                                 'He eats'    NaN  NaN

最新更新