在pandas中,如何对两个列进行分组并对第三个不同的列求和?



假设您有以下称为raw的数据帧:

A   B   C
Index           
2007-01-03  NaN        -1   1
2007-01-04  0.021954    1   2
2007-01-05  -0.007147   -1  3
2007-01-08  0.004926    1   4
2007-01-09  0.079800    1   4
2007-01-10  0.046746    1   4
2007-01-11  -0.012448   -1  5
2007-01-12  -0.012394   -1  5
2007-01-16  0.025873    1   6
2007-01-17  -0.022391   -1  7
2007-01-18  -0.063928   -1  7
2007-01-19  -0.006420   -1  7
2007-01-22  -0.019511   -1  7
2007-01-23  -0.012639   -1  7
2007-01-24  0.011601    1   8
2007-01-25  -0.005204   -1  9
2007-01-26  -0.010138   -1  9
2007-01-29  0.006538    1   10
2007-01-30  -0.004549   -1  11
2007-01-31  0.002102    1   12

如何对列B &C列的计数同时也对A列求和?我能够通过raw[['B','C']].value_counts()(按B列分组和C列计数)产生前者,但无法将此结果与后者(A列和)结合。

raw[['B','C']].value_counts()
B   C 
-1  7     5
1  4     3
-1  5     2
9     2
1     1
3     1
11    1
1  2     1
6     1
8     1
10    1
12    1
dtype: int64

我想你要按B,C分组:

df.groupby(['B','C']).agg({'C':'count', 'A':'sum'})

输出:

C         A
B  C              
-1 1   1  0.000000
3   1 -0.007147
5   2 -0.024842
7   5 -0.124889
9   2 -0.015342
11  1 -0.004549
1 2   1  0.021954
4   3  0.131472
6   1  0.025873
8   1  0.011601
10  1  0.006538
12  1  0.002102

或者使用named agg更好,它允许您重命名新列:

df.groupby(['B','C']).agg(C_count=('C','count'),
A_sum=('A','sum'))

输出:

C_count     A_sum
B  C                    
-1 1         1  0.000000
3         1 -0.007147
5         2 -0.024842
7         5 -0.124889
9         2 -0.015342
11        1 -0.004549
1 2         1  0.021954
4         3  0.131472
6         1  0.025873
8         1  0.011601
10        1  0.006538
12        1  0.002102

最新更新