数据:
df = pd.DataFrame({'name':['Jane','Jane','Mike','Mike','Jane','Jane','Jane',
'Mike','Mike','Jane','Jane','Jane'],
'ctg':['A','P','C','B','B','C','B','E','G','L','M','X']})
预期输出:
名称 | ctg |
---|---|
Jane | A |
Jane | B |
简 | L |
您可以在带有掩码的自定义组上使用GroupBy.first
:
mask = df['name'].eq('Jane')
out = (df[mask] # keep only Jane
# group by consecutive names
.groupby(df['name'].ne(df['name'].shift()).cumsum(), as_index=False)
.first() # first row of each group
)
输出:
name ctg
0 Jane A
1 Jane B
2 Jane L