我有两个数据帧
asks_price asks_qty exchange_name_ask bids_price bids_qty exchange_name_bid
0 20156.51 0.000745 Coinbase 20153.28 0.000200 Coinbase
1 20157.52 0.050000 Coinbase 20152.27 0.051000 Coinbase
2 20158.52 0.000745 Coinbase 20151.28 0.000200 Kraken
3 20158.52 0.050000 FTX 20151.28 0.051000 Coinbase
我想把相同的价格分组,把数量加在一起,然后把交易所的名称组合起来,比如:
asks_price asks_qty exchange_name_ask bids_price bids_qty exchange_name_bid
0 20156.51 0.000745 Coinbase 20153.28 0.000200 Coinbase
1 20157.52 0.050000 Coinbase 20152.27 0.051000 Coinbase
2 20158.52 0.050745 CoinbaseFTX 20151.28 0.051200 KrakenCoinbase
我成功使用
df.groupby(['asks_price', 'bids_price']).sum(False)
它起作用了,但如果这是同一个名字,我不想加入它。我不想要CoinbaseCoinbase或KrakenKraken所以我必须在SUM函数中包含一个if,我该怎么做呢感谢
尝试如下:
import pandas as pd
# Changing `2: 'Kraken'` to `2: 'Coinbase' for `exchange_name_bid` to generate
# an example that doesn't require concatenation
data = {'asks_price': {0: 20156.51, 1: 20157.52, 2: 20158.52, 3: 20158.52},
'asks_qty': {0: 0.000745, 1: 0.05, 2: 0.000745, 3: 0.05},
'exchange_name_ask': {0: 'Coinbase', 1: 'Coinbase', 2: 'Coinbase',
3: 'FTX'},
'bids_price': {0: 20153.28, 1: 20152.27, 2: 20151.28, 3: 20151.28},
'bids_qty': {0: 0.0002, 1: 0.051, 2: 0.0002, 3: 0.051},
'exchange_name_bid': {0: 'Coinbase', 1: 'Coinbase', 2: 'Coinbase',
3: 'Coinbase'}}
df = pd.DataFrame(data)
res = df.groupby(['asks_price', 'bids_price'], as_index=False).agg(
{'asks_qty':'sum',
'bids_qty':'sum',
'exchange_name_ask': lambda x: ','.join(dict.fromkeys(x).keys()),
'exchange_name_bid': lambda x: ','.join(dict.fromkeys(x).keys())
})
print(res)
asks_price bids_price asks_qty bids_qty exchange_name_ask
0 20156.51 20153.28 0.000745 0.0002 Coinbase
1 20157.52 20152.27 0.050000 0.0510 Coinbase
2 20158.52 20151.28 0.050745 0.0512 Coinbase,FTX
exchange_name_bid
0 Coinbase
1 Coinbase
2 Coinbase # no concatenation here
解释
- 使用
df.groupby.agg
并拆分要应用于不同列的函数。即,列'*_qty'
的sum
,以及列'exchange_name_*'
的lambda函数,其中dict.fromkeys
嵌套在join
中。(正如@jezrael
在评论中提到的,dict.fromkeys
比使用Series.drop_duplicates
具有更好的性能。参见其他地方的answer
。( - 如果不需要分隔符,请将
','.join
更改为''.join
。如果要重新排序res
的列,使其与原始df
的顺序匹配,请使用res = res.loc[:,df.columns]