22/2 John 玛丽22/2 标记 23/2 John 24/2标记 玛丽
如果我有一个像这样的Panda DataFrame
日期IIUC,尝试:
#convert to datetime if needed
df["date"] = pd.to_datetime(df["date"], format="%d/%m")
#convert string name to categorical codes for numerical aggegation
df["people"] = pd.Categorical(df["person_active"]).codes
#compute the rolling unique count
df["people_active"] = (df.rolling("2D", on="date")["people"]
.agg(lambda x: x.nunique())
.groupby(df["date"])
.transform("max")
)
#drop the unneccessary column
df = df.drop("people", axis=1)
>>> df
date person_active people_active
0 1900-02-22 John 3.0
1 1900-02-22 Marie 3.0
2 1900-02-22 Mark 3.0
3 1900-02-23 John 3.0
4 1900-02-24 Mark 3.0
5 1900-02-24 Marie 3.0
按日期分组,计算唯一值,然后就可以开始了:
df.groupby('date').nunique().rolling('2d').sum()