在 Pandas 数据帧中按user_id运行"自首次出现以来的天数"



我有一个问题,我正试图在Pandas中找到一个聪明的解决方案。我知道如何在Excel中工作,但在Python中却很吃力。

想象一下以下数据

日期user1020-01-01user122020-01-022020-01-05用户12020-01-07用户12020-01-01用户22020-01-03用户22020-01-04用户2

试试这个:

df["date"] = pd.to_datetime(df["date"])
df["days_since_first_occurence"] = (df["date"]-df.groupby("user_id")["date"].transform("min")).dt.days
>>> df
user_id       date  days_since_first_occurence
0   user1 2020-01-01                           0
1   user1 2020-01-02                           1
2   user1 2020-01-05                           4
3   user1 2020-01-07                           6
4   user2 2020-01-01                           0
5   user2 2020-01-03                           2
6   user2 2020-01-04                           3

Groupby并变换组中的第一个值,然后减去

df['days_since_first_occurence'] = df['date'] - df.groupby('user_id')['date'].transform('first')
user_id       date days_since_first_occurence
0   user1 2020-01-01                     0 days
1   user1 2020-01-02                     1 days
2   user1 2020-01-05                     4 days
3   user1 2020-01-07                     6 days
4   user2 2020-01-01                     0 days
5   user2 2020-01-03                     2 days
6   user2 2020-01-04                     3 days
df['date'] = pd.to_datetime(df['date'])
df['days_since_first_occurence'] = df.apply(lambda x: x['date'] - df[df['user_id'] == x['user_id']].iloc[0]['date'],axis=1)

最新更新