如何在Python Pandas中对具有count唯一值的多个列进行分组

我有一个数据帧df_data:

CustID    MatchID    LocationID   isMajor  #Major is 1 and Minor is 0
1        11111       324         0  
1        11111       324         0
1        11111       324         0
1        22222       490         0
1        33333       675         1
2        44444       888         0

我有一个这样的函数和参数:

def compute_something(list_minor = None, list_major = None):
return pass

<<p>解释参数/strong>:对于CustID = 1，参数应该是list_minor = [3,1](位置不重要)，list_major = [1]，因为LocationID = 324得到3次，LocationID = 490得到1次(324,490得到isMajor = 0，所以它应该变成1list)。类似地，CustID2有参数list_minor = [1]和list_major = [](如果他没有主/次数据，我应该通过[]。
这是我的程序:

data = [ [1, 11111, 324, 0], [1, 11111, 324, 0], [1, 11111, 324, 0], [1, 22222, 490, 0], [1, 33333, 675, 1], [2, 44444, 888, 0] ] df_data = pd.DataFrame(data, columns = ['CustID','MatchID','LocationID','IsMajor']) df_parameter = DataFrame() df_parameter['parameters'] = df.groupby(['CustID','MatchID','IsMajor'])['LeagueID'].nunique()
但df_parameter['parameters']的结果是错误的:

parameters CustID MatchID IsMajor 1 11111 0 1 #should be 3 22222 0 1 33333 1 1 2 44444 0 1
我可以用groupby获得上面解释的参数并将它们传递给函数吗?

如何:

(df.groupby(['CustID','isMajor', 'MatchID']).size()
.groupby(level=[0,1]).agg(set)
.unstack('isMajor')
)

输出:

isMajor       0    1
CustID              
1        {1, 3}  {1}
2           {1}  NaN

更新试试这个组by:

(df.groupby(['CustID','isMajor'])['MatchID']
.apply(lambda x: x.value_counts().agg(list))
.unstack('isMajor')
)

另外，使用两个键的groupby可能很慢。在这种情况下，您可以将键和groupby连接在上面:

keys = df['CustID'].astype(str) + '_' + df['isMajor'].astype(str)
(df.groupby(keys)['MatchID']
.apply(lambda x: x.value_counts().agg(list))
)

相关内容

最新更新

热门标签：