我有以下DataFrame,我正在努力为每个客户查找第一个日期(按升序(的标志列=Y
df = {
"customer_key": ["1","1","1","2","2","2"],
"date": ["2020-09-30", "2020-01-31", "2020-06-30","2020-01-31", "2020-02-29", "2020-03-31"],
"flag": ["Y","N","Y","N","N","Y"]
}
预期结果:
- 对于客户1,时间为2020-06-30
- 对于客户2,则为2020-03-31
所以首先我要按日期排序。
df.sort_values('date', inplace=True)
这是我遇到的问题,我知道我需要按客户密钥分组,然后找到第一个flag=y的地方,我现在不知道该怎么做了。
df['first_occurence_date'] = df.groupby(by='customer_key') ## i dunno...
尝试
out = df.loc[df['flag'].eq('Y')].groupby('customer_key').date.min()
customer_key
1 2020-06-30
2 2020-03-31
Name: date, dtype: object