user1 020-01-01user1 22020-01-02 2020-01-05用户12020-01-07用户12020-01-01用户22020-01-03用户22020-01-04用户2
我有一个问题,我正试图在Pandas中找到一个聪明的解决方案。我知道如何在Excel中工作,但在Python中却很吃力。
想象一下以下数据
日期试试这个:
df["date"] = pd.to_datetime(df["date"])
df["days_since_first_occurence"] = (df["date"]-df.groupby("user_id")["date"].transform("min")).dt.days
>>> df
user_id date days_since_first_occurence
0 user1 2020-01-01 0
1 user1 2020-01-02 1
2 user1 2020-01-05 4
3 user1 2020-01-07 6
4 user2 2020-01-01 0
5 user2 2020-01-03 2
6 user2 2020-01-04 3
Groupby并变换组中的第一个值,然后减去
df['days_since_first_occurence'] = df['date'] - df.groupby('user_id')['date'].transform('first')
user_id date days_since_first_occurence
0 user1 2020-01-01 0 days
1 user1 2020-01-02 1 days
2 user1 2020-01-05 4 days
3 user1 2020-01-07 6 days
4 user2 2020-01-01 0 days
5 user2 2020-01-03 2 days
6 user2 2020-01-04 3 days
df['date'] = pd.to_datetime(df['date'])
df['days_since_first_occurence'] = df.apply(lambda x: x['date'] - df[df['user_id'] == x['user_id']].iloc[0]['date'],axis=1)