我有下面的样本数据集,这是一个分组的结果,我按步骤和活动源分组。并在Set
中返回分组的useriddf2=df[['CampaignSource','UserId','Steps']].groupby(['Steps','CampaignSource'],as_index=False).agg(lambda x: set(x))
Steps | CampaignSource | Set_UserId | "Step-1" | "Apple" | "Jeff"John","Antonio","Jon" |
---|---|---|
"Step-1"> | "Banana" | "Jeff"John","Antonio", Jon" |
"Step-1"> | "Potato" | "Jeff"John","Antonio", Jon" |
"Step-2"> | "Apple" | "Jeff","John" |
"Step-2"> | "Banana" | "Jeff"John","Antonio" |
"Step-2"> | "Potato" | "Jeff","John" |
"Step-3"> | "Apple" | "Jeff" |
"Step-3"> | "Banana" | "Jeff","John" |
"Step-3"> | "Potato" | "Jeff" |
pd.DataFrame([item for sub in (list(df.groupby("CampaignSource").agg(lambda x: x).apply(lambda x: list(zip([x.name] * len(x["Steps"]), x["Steps"][:-1], [(list(set(s) - set(x["Set_UserId"][i+1]))) for i,s in enumerate(x["Set_UserId"][:-1])])), axis=1).to_dict().values())) for item in sub])
稍微复杂一点:)如果你想要那种形状…如果您想要的是另一种形状,它可以更简单