Pandas数据透视表,包含来自两个独立数据帧的聚合



我正在尝试构建一个pivot_table,它包含两个数据帧(df1df2(的值差异以及agg平均值和标准偏差。

包含df1 =

gender year     statistics    s_values
male   1999  cigarette use       100
male   1999  cellphone use       310
male   1999   internet use       101
male   1999    alcohol use       100

包含df2 =

gender   year     statistics    s_values
female   1999  cigarette use       156
female   1999  cellphone use       198
female   1999   internet use       232
female   1999    alcohol use       243

所需输出为my_pivot_table =

year    statistics       male   female  difference   mean    std
1999  cigarette use     100.0    156.0       56.0    128.0   28.0
1999  cellphone use     310.0    198.0      112.0    254.0   56.0
1999   internet use     101.0    232.0       78.0    166.5   65.5
1999    alcohol use     100.0    243.0      143.0    171.5   50.0 

我将df1df2合并到一个名为merged_df的数据帧中,我不确定这是否是构建所需数据透视表的正确步骤。。。我不知道从这里到哪里male列包含df1s_values,阴性列包含df2s_values包含merged_df =

gender   year     statistics    s_values
male   1999  cigarette use       100
male   1999  cellphone use       310
male   1999   internet use       101
male   1999    alcohol use       100
female   1999  cigarette use       156
female   1999  cellphone use       198
female   1999   internet use       232
female   1999    alcohol use       243

尝试:

x = pd.concat([df1, df2]).pivot(
index=["year", "statistics"], columns="gender", values="s_values"
)
x["difference"] = (x["male"] - x["female"]).abs()
x["mean"] = x[["male", "female"]].mean(axis=1)
x["std"] = x[["female", "male"]].std(axis=1, ddof=0)
print(x.reset_index().rename_axis("", axis=1))

打印:

year     statistics  female  male  difference   mean   std
0  1999    alcohol use     243   100         143  171.5  71.5
1  1999  cellphone use     198   310         112  254.0  56.0
2  1999  cigarette use     156   100          56  128.0  28.0
3  1999   internet use     232   101         131  166.5  65.5

相关内容

  • 没有找到相关文章