我希望使用Python Pandas将行合并到一个大型Excel文件中。比方说,在Excel或csv文件中,我有:
Kelly | $400 | | | $20 |
Kelly | | $200 | | |
Kelly | | | $500 | |
John | | $2 | ($7) | |
John | | | | $10 |
我想以结束
Kelly | $400 | $200 | $500 | $20 |
John | | $2 | ($7) | $10 |
有简单的解决方案吗?提前谢谢。
听起来你在寻找一个groupby:
import pandas as pd
import numpy as np
df = pd.DataFrame(
data={'Name' : ['Kelly', 'Kelly', 'Kelly', 'John', 'John'],
'col1' : [400, np.nan, np.nan, np.nan, np.nan],
'col2' : [np.nan, 200, np.nan, 2, np.nan],
'col3' : [np.nan, np.nan, 500, -7, np.nan],
'col4' : [20, np.nan, np.nan, np.nan, 10],})
打印(df(
Name col1 col2 col3 col4
0 Kelly 400.0 NaN NaN 20.0
1 Kelly NaN 200.0 NaN NaN
2 Kelly NaN NaN 500.0 NaN
3 John NaN 2.0 -7.0 NaN
4 John NaN NaN NaN 10.0
print(df.groupby('Name').sum())
输出:
col1 col2 col3 col4
Name
John 0.0 2.0 -7.0 10.0
Kelly 400.0 200.0 500.0 20.0
编辑:如果您只获得第一列的总和,那么其他列的数据类型可能是非数字的。如果在整个数据帧上应用groupby,则每一列都将生成aggfunction结果。尝试使用df.info((查看列的数据类型。