用多个数字列显示数据帧中每组的前5行



我使用df.groupby(['departamento','campo']).describe()获得了以下数据帧

df_统计:

produccion                                         
mean           std          min           max
departamento campo                                                           
f7fd2c4f     8dd7c41b    4714.695603   1076.940951  3091.015553   6378.546534
82edafb9    1851.291482    841.512944   675.814722   3006.476183
58a0d8ca    1768.151315    347.896113  1033.459536   2242.544338
8ba362f3     257.917212    231.490925     0.000000    497.916659
4f4a249f     192.811711     80.299111   129.190598    356.437730
741abe20     431.717352     71.053604   291.831556    529.518332
51cbb05d     489.804186     65.542073   353.186216    582.869264
4d0fb45e     358.597250     30.166391   314.168045    407.842103
c98bd9dd     437.244383     27.135823   402.546159    481.245852
7eb34927     106.426374     22.579237    81.994706    142.283652
ec12ad00     44502c89      15.015145     11.467353     0.000000     29.241879
5558f26e       1.107400      0.959445     0.000000      2.762156
85c1a0e5       0.122720      0.425113     0.000000      1.472635
cf33cb8a     2f614c0b   12458.858168  12042.715975   150.635367  25999.977584
5559f8d7    4272.447078   1326.999765  2458.231739   6059.658900
fd6f6562    3378.712031   1194.101786   869.763739   4814.220212
febb6cf6    4149.936221    833.663173  2471.139924   5827.822674
d56beadb     474.831361    810.840341     0.000000   2283.465569
124207de    3863.484888    796.945367  2713.111304   5150.735620
1f d2689f   6099.963902    768.102604  4766.241346   7897.993261
c728bf96    3361.623457    704.293795  2203.721911   4949.989960

我已经根据标准偏差("d"(列对数据帧进行了排序,但我只想在"departamento"列中显示每组的前5个值。

我尝试了以下代码:df_statistics.nlargest(5, columns =('produccion','std'))

但我在"departamento"栏中获得了前5名:

produccion                                         
mean           std          min           max
departamento campo                                                          
cf33cb8a     2f614c0b  12458.858168  12042.715975   150.635367  25999.977584
5559f8d7   4272.447078   1326.999765  2458.231739   6059.658900
fd6f6562   3378.712031   1194.101786   869.763739   4814.220212
f7fd2c4f     8dd7c41b   4714.695603   1076.940951  3091.015553   6378.546534
82edafb9   1851.291482    841.512944   675.814722   3006.476183

如何根据"std"列显示每组的前5个值

使用另一个groupby:

df_statistics.groupby('departamento')
.apply(lambda grp: grp.nlargest(5, columns=('produccion', 'std')))

IIUC,

df.groupby('departamento').head(5)

输出:

produccion                                         
mean           std          min           max
departamento campo                                                          
f7fd2c4f     8dd7c41b   4714.695603   1076.940951  3091.015553   6378.546534
82edafb9   1851.291482    841.512944   675.814722   3006.476183
58a0d8ca   1768.151315    347.896113  1033.459536   2242.544338
8ba362f3    257.917212    231.490925     0.000000    497.916659
4f4a249f    192.811711     80.299111   129.190598    356.437730
ec12ad00     44502c89     15.015145     11.467353     0.000000     29.241879
5558f26e      1.107400      0.959445     0.000000      2.762156
85c1a0e5      0.122720      0.425113     0.000000      1.472635
cf33cb8a     2f614c0b  12458.858168  12042.715975   150.635367  25999.977584
5559f8d7   4272.447078   1326.999765  2458.231739   6059.658900
fd6f6562   3378.712031   1194.101786   869.763739   4814.220212
febb6cf6   4149.936221    833.663173  2471.139924   5827.822674
d56beadb    474.831361    810.840341     0.000000   2283.465569

@最近的进展是正确的,

df.sort_values(by=('produccion',  'std'), ascending=False)
.groupby('departamento')
.head(5)
.sort_index()

先对数据帧进行排序,然后用headsort_indexgroupby进行排序。

最新更新