返回一个不是下一个步骤，但在第一个步骤中的用户

我有下面的样本数据集，这是一个分组的结果，我按步骤和活动源分组。并在Set

中返回分组的useriddf2=df[['CampaignSource','UserId','Steps']].groupby(['Steps','CampaignSource'],as_index=False).agg(lambda x: set(x))

tbody> <<tr>

Steps	CampaignSource	Set_UserId
"Step-1"	"Apple"	"Jeff"John","Antonio","Jon"
"Step-1">	"Banana"	"Jeff"John","Antonio", Jon"
"Step-1">	"Potato"	"Jeff"John","Antonio", Jon"
"Step-2">	"Apple"	"Jeff","John"
"Step-2">	"Banana"	"Jeff"John","Antonio"
"Step-2">	"Potato"	"Jeff","John"
"Step-3">	"Apple"	"Jeff"
"Step-3">	"Banana"	"Jeff","John"
"Step-3">	"Potato"	"Jeff"

pd.DataFrame([item for sub in (list(df.groupby("CampaignSource").agg(lambda x: x).apply(lambda x: list(zip([x.name] * len(x["Steps"]), x["Steps"][:-1], [(list(set(s) - set(x["Set_UserId"][i+1]))) for i,s in enumerate(x["Set_UserId"][:-1])])), axis=1).to_dict().values())) for item in sub])

稍微复杂一点:)如果你想要那种形状…如果您想要的是另一种形状，它可以更简单

相关内容

最新更新

热门标签：