计数具有重复日期panda的滚动窗口唯一值

如果我有一个像这样的Panda DataFrame

日期22/2John玛丽22/2标记23/2John24/2标记玛丽

IIUC，尝试：

#convert to datetime if needed
df["date"] = pd.to_datetime(df["date"], format="%d/%m")
#convert string name to categorical codes for numerical aggegation
df["people"] = pd.Categorical(df["person_active"]).codes
#compute the rolling unique count
df["people_active"] = (df.rolling("2D", on="date")["people"]
.agg(lambda x: x.nunique())
.groupby(df["date"])
.transform("max")
)
#drop the unneccessary column
df = df.drop("people", axis=1)
>>> df
date person_active  people_active
0 1900-02-22          John            3.0
1 1900-02-22         Marie            3.0
2 1900-02-22          Mark            3.0
3 1900-02-23          John            3.0
4 1900-02-24          Mark            3.0
5 1900-02-24         Marie            3.0

按日期分组，计算唯一值，然后就可以开始了：

df.groupby('date').nunique().rolling('2d').sum()

相关内容

最新更新

热门标签：