我有这个集合:
df=pd.DataFrame({'user':[1,1,2,2,2,3,3,3,3,3,4,4],
'date':['1995-09-01','1995-09-02','1995-10-03','1995-10-04','1995-10-05','1995-11-07','1995-11-08','1995-11-09','1995-11-10','1995-11-15','1995-12-18','1995-12-20'],
'type':['a','a','b','a','c','a','b','a','b','b','a','b']})
这给了我:
user date type
1 1995-09-01 a
1 1995-09-02 a
2 1995-10-03 b
2 1995-10-04 a
2 1995-10-05 c
3 1995-11-07 a
3 1995-11-08 b
3 1995-11-09 a
3 1995-11-10 b
3 1995-11-15 b
4 1995-12-18 a
4 1995-12-20 b
我想创建一个新列,其中显示"类型"列上的值计数,按"用户"列分组
以下是预期结果:
user date type cta_a
1 1995-09-01 a 2
1 1995-09-02 a 2
2 1995-10-03 b 1
2 1995-10-04 a 1
2 1995-10-05 c 1
3 1995-11-07 a 2
3 1995-11-08 b 2
3 1995-11-09 a 2
3 1995-11-10 b 2
3 1995-11-15 b 2
4 1995-12-18 a 1
4 1995-12-20 b 1
我尝试了以下方法,但没有用。
df['ct_a'] = df.groupby('user')[df['type']== 'a'].transform('count')
mask
第type
列中的非a
值,然后使用count
groupby
并transform
:
df['ct_a'] = df['type'].mask(lambda x: x.ne('a'))
.groupby(df['user']).transform('count')
<小时 />user date type ct_a
0 1 1995-09-01 a 2
1 1 1995-09-02 a 2
2 2 1995-10-03 b 1
3 2 1995-10-04 a 1
4 2 1995-10-05 c 1
5 3 1995-11-07 a 2
6 3 1995-11-08 b 2
7 3 1995-11-09 a 2
8 3 1995-11-10 b 2
9 3 1995-11-15 b 2
10 4 1995-12-18 a 1
11 4 1995-12-20 b 1