将pandas-df中的句子分成几行单词,并对每个句子进行编号



我是python的新手,我有一个pandas数据帧,如下所示:

df =
sn  sent                                                    ent
0   ['an', 'apple', 'is', 'an', 'example', 'of', 'what?']   ['O', 'F', '0', '0', '0', 'O', 'O']
1   ['a', 'potato', 'is', 'an', 'example', 'of', 'what?']   ['O', 'V', '0', '0', '0', 'O', 'O']

我想创建另一个熊猫数据帧,如下所示:

newdf= 
sn  sent        ent
0   an          O
apple       F
is          O
an          O
example     O
of          O
what?       O
1   a           O
potato      V
is          O
an          O
example     O
of          O
what?       O

我尝试了这个代码,最后得到了代码下面显示的内容

df.set_index('sn')
.stack()
.str.split(expand=True)
.stack()
.unstack(level=1)
.reset_index(level=0, drop=0)

它接近我想要的,但似乎可以找到其他

sn  sent        ent
0   ['an',      ['O',
0   'apple',    'F',
0   'is',       'O',
0   'an',       'O',
0   'example',  'O',
0   'of',       'O',
0   'what?',    'O',
1   'a',        'O',
1   'potato',   'V',
1   'is',       'O',
1   'an',       'O',
1   'example',  'O',
1   'of',       'O',
1   'what?']    'O']

任何指针都非常感谢

df = pd.DataFrame({'sn': [0,1],
'sent': [['an', 'apple', 'is', 'an', 'example', 'of', 'what?'], ['a', 'potato', 'is', 'an', 'example', 'of', 'what?']],
'ent': [['O', 'F', '0', '0', '0', 'O', 'O'], ['O', 'V', '0', '0', '0', 'O', 'O']]})
df.apply(pd.Series.explode).set_index('sn')

结果:

sent ent
sn             
0        an   O
0     apple   F
0        is   0
0        an   0
0   example   0
0        of   O
0     what?   O
1         a   O
1    potato   V
1        is   0
1        an   0
1   example   0
1        of   O
1     what?   O

最新更新