SUM函数中的Pandas数据帧IF



我有两个数据帧

asks_price  asks_qty exchange_name_ask     bids_price  bids_qty exchange_name_bid
0      20156.51  0.000745          Coinbase      20153.28  0.000200          Coinbase
1      20157.52  0.050000          Coinbase      20152.27  0.051000          Coinbase
2      20158.52  0.000745          Coinbase      20151.28  0.000200          Kraken
3      20158.52  0.050000          FTX           20151.28  0.051000          Coinbase

我想把相同的价格分组,把数量加在一起,然后把交易所的名称组合起来,比如:

asks_price  asks_qty exchange_name_ask     bids_price  bids_qty exchange_name_bid
0    20156.51  0.000745          Coinbase      20153.28  0.000200          Coinbase
1    20157.52  0.050000          Coinbase      20152.27  0.051000          Coinbase
2    20158.52  0.050745          CoinbaseFTX   20151.28  0.051200       KrakenCoinbase

我成功使用

df.groupby(['asks_price', 'bids_price']).sum(False)

它起作用了,但如果这是同一个名字,我不想加入它。我不想要CoinbaseCoinbase或KrakenKraken所以我必须在SUM函数中包含一个if,我该怎么做呢感谢

尝试如下:

import pandas as pd
# Changing `2: 'Kraken'` to `2: 'Coinbase' for `exchange_name_bid` to generate
# an example that doesn't require concatenation
data = {'asks_price': {0: 20156.51, 1: 20157.52, 2: 20158.52, 3: 20158.52}, 
'asks_qty': {0: 0.000745, 1: 0.05, 2: 0.000745, 3: 0.05}, 
'exchange_name_ask': {0: 'Coinbase', 1: 'Coinbase', 2: 'Coinbase', 
3: 'FTX'}, 
'bids_price': {0: 20153.28, 1: 20152.27, 2: 20151.28, 3: 20151.28}, 
'bids_qty': {0: 0.0002, 1: 0.051, 2: 0.0002, 3: 0.051}, 
'exchange_name_bid': {0: 'Coinbase', 1: 'Coinbase', 2: 'Coinbase', 
3: 'Coinbase'}}
df = pd.DataFrame(data)
res = df.groupby(['asks_price', 'bids_price'], as_index=False).agg(
{'asks_qty':'sum',
'bids_qty':'sum',
'exchange_name_ask': lambda x: ','.join(dict.fromkeys(x).keys()),
'exchange_name_bid': lambda x: ','.join(dict.fromkeys(x).keys())
})
print(res)
asks_price  bids_price  asks_qty  bids_qty exchange_name_ask  
0    20156.51    20153.28  0.000745    0.0002          Coinbase   
1    20157.52    20152.27  0.050000    0.0510          Coinbase   
2    20158.52    20151.28  0.050745    0.0512      Coinbase,FTX   
exchange_name_bid  
0          Coinbase  
1          Coinbase  
2          Coinbase  # no concatenation here

解释

  • 使用df.groupby.agg并拆分要应用于不同列的函数。即,列'*_qty'sum,以及列'exchange_name_*'的lambda函数,其中dict.fromkeys嵌套在join中。(正如@jezrael在评论中提到的,dict.fromkeys比使用Series.drop_duplicates具有更好的性能。参见其他地方的answer。(
  • 如果不需要分隔符,请将','.join更改为''.join。如果要重新排序res的列,使其与原始df的顺序匹配,请使用res = res.loc[:,df.columns]

最新更新