我有两个数据帧:
{'id': {4: 1548638, 6: 1953603, 7: 1956216, 8: 1962245, 9: 1981386, 10: 1981773, 11: 2004787, 13: 2017418, 14: 2020989, 15: 2045043}, 'total': {4: 17, 6: 38, 7: 59, 8: 40, 9: 40, 10: 40, 11: 80, 13: 44, 14: 51, 15: 46}}
{'id': {4: 1548638, 6: 1953603, 7: 1956216, 8: 1962245, 9: 1981386, 10: 1981773, 11: 2004787, 13: 2017418, 14: 2020989, 15: 2045043}, 'total': {4: 17, 6: 38, 7: 59, 8: 40, 9: 40, 10: 40, 11: 80, 13: 44, 14: 51, 15: 46}}
对于存在于两个数据框中的每个'id',我想在'total'中计算它们值的平均值,并将其放在一个新的数据框中。
我试着:
pd.merge(df1, df2, on="id")
希望我能做到:
merged_df[['total']].mean(axis=1)
但是它根本不起作用。
你怎么能这么做?
您可以使用:
df1.merge(df2, on='id').set_index('id').mean(axis=1).reset_index(name='total')
或者,如果您有许多列,使用更通用的方法:
(df1.merge(df2, on='id', suffixes=(None, '_other')).set_index('id')
.rename(columns=lambda x: x.removesuffix('_other')) # requires python 3.9+
.groupby(axis=1, level=0)
.mean().reset_index()
)
输出:
id total
0 1548638 17.0
1 1953603 38.0
2 1956216 59.0
3 1962245 40.0
4 1981386 40.0
5 1981773 40.0
6 2004787 80.0
7 2017418 44.0
8 2020989 51.0
9 2045043 46.0
你可以这样做:
df1 = pd.DataFrame({'id': {4: 1548638, 6: 1953603, 7: 1956216, 8: 1962245, 9: 1981386, 10: 1981773, 11: 2004787, 13: 2017418, 14: 2020989, 15: 2045043}, 'total': {4: 17, 6: 38, 7: 59, 8: 40, 9: 40, 10: 40, 11: 80, 13: 44, 14: 51, 15: 46}})
df2 = pd.DataFrame({'id': {4: 1548638, 6: 1953603, 7: 1956216, 8: 1962245, 9: 1981386, 10: 1981773, 11: 2004787, 13: 2017418, 14: 2020989, 15: 2045043}, 'total': {4: 17, 6: 38, 7: 59, 8: 40, 9: 40, 10: 40, 11: 80, 13: 44, 14: 51, 15: 46}})
merged_df = df1.merge(df2, on='id')
merged_df['total_mean'] = merged_df.filter(regex='total').mean(axis=1)
print(merged_df)
输出:
id total_x total_y total_mean
0 1548638 17 17 17.0
1 1953603 38 38 38.0
2 1956216 59 59 59.0
3 1962245 40 40 40.0
4 1981386 40 40 40.0
5 1981773 40 40 40.0
6 2004787 80 80 80.0
7 2017418 44 44 44.0
8 2020989 51 51 51.0
9 2045043 46 46 46.0