我有一个DataFrame,看起来像这样:
| characters | result |
|:----------:|:------:|
| b | TP |
| a | TP |
| t | FN |
| NaN | None |
| c | TN |
| o | FP |
| p | TP |
我之前把它从"bat"one_answers";cop"。每个单词由NaN行分隔。我想把它们变成DataFrame格式,像这样:
| characters | result | word |
|:----------:|:----- :|:----:|
| b | TP | bat |
| a | TP | bat |
| t | FN | bat |
| NaN | None | None |
| c | TN | cop |
| o | FP | cop |
| p | TP | cop |
编辑:请忽略结果列。这里只关心characters
和word
。原数据框由word
列组成,应用熊猫explode()
得到characters
列。
您可以创建一个自定义组来标识连续的非NaN值,然后连接并映射到原始数据框:
m = df['characters'].isna()
group = (m!=m.shift()).cumsum().mask(m)
to_map = df.groupby(group)['characters'].apply(lambda g: ''.join(g))
df['word'] = group.map(to_map)
输出:
characters result word
0 b TP bat
1 a TP bat
2 t FN bat
3 NaN None NaN
4 c TN cop
5 o FP cop
6 p TP cop