从特定分组的库仑数据中筛选第一行



数据:

df = pd.DataFrame({'name':['Jane','Jane','Mike','Mike','Jane','Jane','Jane',
'Mike','Mike','Jane','Jane','Jane'],
'ctg':['A','P','C','B','B','C','B','E','G','L','M','X']})

预期输出:

名称ctg
JaneA
JaneB
L

您可以在带有掩码的自定义组上使用GroupBy.first

mask = df['name'].eq('Jane')
out = (df[mask]  # keep only Jane
# group by consecutive names
.groupby(df['name'].ne(df['name'].shift()).cumsum(), as_index=False)
.first()  # first row of each group
)

输出:

name ctg
0  Jane   A
1  Jane   B
2  Jane   L

最新更新