我有一个数据帧,如下所示:
Datetime Volume Price
2020-08-05 09:15:00 1033 504
2020-08-05 09:15:00 1960 516
2020-08-05 09:15:00 1724 520
2020-08-05 09:15:00 1870 540
2020-08-05 09:20:00 1024 576
2020-08-05 09:20:00 1960 548
2020-08-05 09:20:00 1426 526
2020-08-05 09:20:00 1968 518
2020-08-05 09:30:00 1458 511
2020-08-05 09:30:00 1333 534
2020-08-05 09:30:00 1322 555
2020-08-05 09:30:00 1425 567
2020-08-05 09:30:00 1245 598
我想在Datetime列的groupby之后找到前两个最大成交量和相应的价格。
结果数据帧如下:
Datetime Volume Price
2020-08-05 09:15:00 1960 516
2020-08-05 09:15:00 1870 540
2020-08-05 09:20:00 1960 548
2020-08-05 09:20:00 1968 518
2020-08-05 09:30:00 1858 511
2020-08-05 09:30:00 1925 567
在groupby
:之前使用sort_values
print (df.sort_values("Volume", ascending=False)
.groupby("Datetime").head(2).sort_index())
Datetime Volume Price
1 2020-08-05 09:15:00 1960 516
3 2020-08-05 09:15:00 1870 540
5 2020-08-05 09:20:00 1960 548
7 2020-08-05 09:20:00 1968 518
8 2020-08-05 09:30:00 1458 511
11 2020-08-05 09:30:00 1425 567
使用groupby.rank
+boolean indexing
:
df[df.groupby("Datetime")['Volume'].rank(ascending=False).le(2)]
Datetime Volume Price
1 2020-08-05 09:15:00 1960 516
3 2020-08-05 09:15:00 1870 540
5 2020-08-05 09:20:00 1960 548
7 2020-08-05 09:20:00 1968 518
8 2020-08-05 09:30:00 1458 511
11 2020-08-05 09:30:00 1425 567
由于您提到了nlargest
out = df.groupby('Datetime',as_index=False).apply(lambda x : x.nlargest(2, columns=['Volume']))