我有两个PANDAS数据帧,需要在call_id上合并它们。我已经用不同的数据帧完成了这项工作。然而,这次当我尝试时
df = pd.merge(labels, sequences, on = "call_id")
我得到
The column label 'call_id' is not unique.
For a multi-index, the label must be a tuple with elements corresponding to each level.
In [231]: labels
Out[231]:
call_id confidences
1 6081bdea52c838000aaa53d3 {'1': 0.27, '2': 0.68, '0': 0.5}
2 6081c27bde933a000a4384b0 {'1': 0.73, '2': 0.27}
3 6081c54dd12abf000ab3c6f5 {'0': 0.66, '1': 0.67}
4 6081c666d7a1f7001cecce98 {'0': 0.22, '1': 0.82}
5 6081d8576eb5530043e3401f {'2': 0.33, '1': 0.66, '0': 0.23}
.. ... ...
480 transcript96 {'0': 0.38, '1': 0.73}
481 transcript97 {'0': 0.78, '2': 0.31}
482 transcript98 {'1': 0.65, '0': 0.46}
483 transcript99 {'2': 0.29, '1': 0.79}
484 trsc1 {'0': 0.42, '2': 0.27, '1': 0.44}
[484 rows x 2 columns]
In [232]: sequences
Out[232]:
call_id sentiments
1 6081c27bde933a000a4384b0 PENNNNNEENNPNPEPNPPNNNNNNNNNNN
2 6081c54dd12abf000ab3c6f5 NNPNNNPNNNPPNNN
3 6081c666d7a1f7001cecce98 NNNNNPP
4 6081d8576eb5530043e3401f NNNNPNNNNNNNNNNNNNNNNNNPPNNNNNNNNNENNNNNNENNNN...
5 6081d8fb0ef716000a2ef933 NNNNENNNPNEEENNNNNNNNNNNNNNNNNNPNE
.. ... ...
465 transcript96 NPN
466 transcript97 NNNNNEENNNNENPNNNNENNNNNPNNPNNNNNNNNPENNNPPPP
467 transcript98 NNNNNNNNENNNPPNNNENNENNENNNENENNNP
468 transcript99 PENNN
469 trsc1 NPNPEENEPPN
[469 rows x 2 columns]
您必须调用不同的合并函数:
labels.merge(sequences, how='inner', on='call_id')
请在此处查看how=
方法:https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html以确保您了解不同的选项(保留所有行,仅保留右侧或左侧DataFrame中的行等(