假设您有以下称为raw
的数据帧:
A B C
Index
2007-01-03 NaN -1 1
2007-01-04 0.021954 1 2
2007-01-05 -0.007147 -1 3
2007-01-08 0.004926 1 4
2007-01-09 0.079800 1 4
2007-01-10 0.046746 1 4
2007-01-11 -0.012448 -1 5
2007-01-12 -0.012394 -1 5
2007-01-16 0.025873 1 6
2007-01-17 -0.022391 -1 7
2007-01-18 -0.063928 -1 7
2007-01-19 -0.006420 -1 7
2007-01-22 -0.019511 -1 7
2007-01-23 -0.012639 -1 7
2007-01-24 0.011601 1 8
2007-01-25 -0.005204 -1 9
2007-01-26 -0.010138 -1 9
2007-01-29 0.006538 1 10
2007-01-30 -0.004549 -1 11
2007-01-31 0.002102 1 12
如何对列B &C列的计数同时也对A列求和?我能够通过raw[['B','C']].value_counts()
(按B列分组和C列计数)产生前者,但无法将此结果与后者(A列和)结合。
raw[['B','C']].value_counts()
B C
-1 7 5
1 4 3
-1 5 2
9 2
1 1
3 1
11 1
1 2 1
6 1
8 1
10 1
12 1
dtype: int64
我想你要按B,C
分组:
df.groupby(['B','C']).agg({'C':'count', 'A':'sum'})
输出:
C A
B C
-1 1 1 0.000000
3 1 -0.007147
5 2 -0.024842
7 5 -0.124889
9 2 -0.015342
11 1 -0.004549
1 2 1 0.021954
4 3 0.131472
6 1 0.025873
8 1 0.011601
10 1 0.006538
12 1 0.002102
或者使用named agg更好,它允许您重命名新列:
df.groupby(['B','C']).agg(C_count=('C','count'),
A_sum=('A','sum'))
输出:
C_count A_sum
B C
-1 1 1 0.000000
3 1 -0.007147
5 2 -0.024842
7 5 -0.124889
9 2 -0.015342
11 1 -0.004549
1 2 1 0.021954
4 3 0.131472
6 1 0.025873
8 1 0.011601
10 1 0.006538
12 1 0.002102