按数据框分组并排序,然后根据标准查找第一个出现的数据框



我有以下DataFrame,我正在努力为每个客户查找第一个日期(按升序(的标志列=Y

df = {
"customer_key": ["1","1","1","2","2","2"],
"date": ["2020-09-30", "2020-01-31", "2020-06-30","2020-01-31", "2020-02-29", "2020-03-31"],
"flag": ["Y","N","Y","N","N","Y"]
}

预期结果:

  • 对于客户1,时间为2020-06-30
  • 对于客户2,则为2020-03-31

所以首先我要按日期排序。

df.sort_values('date', inplace=True)

这是我遇到的问题,我知道我需要按客户密钥分组,然后找到第一个flag=y的地方,我现在不知道该怎么做了。

df['first_occurence_date'] = df.groupby(by='customer_key') ## i dunno...

尝试

out = df.loc[df['flag'].eq('Y')].groupby('customer_key').date.min()
customer_key
1    2020-06-30
2    2020-03-31
Name: date, dtype: object

最新更新