我有一个Dataframe,并希望为每个类别删除某些行。以下是数据:
data={'GROUP':['A','A','A','B','B','B','B','C','C','C','C','C'],'DATE':['202101','202102','202103','201907','201908','201909',
'201910','202003','202004','202005','202006','202007']}
df=pd.DataFrame(data, columns=['GROUP','DATE'])
GROUP DATE
0 A 202101
1 A 202102
2 A 202103
3 B 201907
4 B 201908
5 B 201909
6 B 201910
7 C 202003
8 C 202004
9 C 202005
10 C 202006
11 C 202007
我想删除每组第二次约会之后的所有行。换句话说,我想创造出这样的效果:
GROUP DATE
0 A 202101
1 A 202102
3 B 201907
4 B 201908
7 C 202003
8 C 202004
使用GroupBy.head
:
df.groupby('GROUP').head(2)
GROUP DATE
0 A 202101
1 A 202102
3 B 201907
4 B 201908
7 C 202003
8 C 202004
将数据帧按GROUP
分组,并应用一个函数只取两个值的切片。
>>> df.groupby(['GROUP'])['DATE'].apply(lambda x: x[:2]).droplevel(-1).reset_index()
GROUP DATE
0 A 202101
1 A 202102
2 B 201907
3 B 201908
4 C 202003
5 C 202004