我有一个这样的数据框架
x.groupby(by='basket').ngroup()
Out[15]:
0 1
1 3
2 1
3 2
4 0
我想为每个组生成uid,因此索引0和2应该具有相同的uid。有简单的方法吗?谢谢。
如果有更简洁的方式,本质上相当于下面的:
y = x.drop_duplicates(subset=['basket'])
y['basket_id'] = y['basket'].apply(lambda x: hashlib.shake_256(json.dumps(sorted(x)).encode('utf-8')).hexdigest(10))
y = y[['basket', 'basket_id']]
x = x.merge(y, how='left', on='basket')
是的,这是可能的:
# generate the uuid
ids = {basket: str(uuid.uuid4()) for basket in x['basket'].unique()}
# map uuid
x['uuid'] = x['basket'].map(ids)
输出:
basket uuid
0 1 e36436ed-7773-44de-9e53-7618cb18d8de
1 3 9cf6902e-4153-4187-8ff8-004a8ec3d2cc
2 1 e36436ed-7773-44de-9e53-7618cb18d8de
3 2 5fc27664-888e-48d2-b348-d18b0089d704
4 0 667f6055-f6b2-45a6-9022-b91ab421ffad
:一般情况下,您可以使用numpy索引:
g = x.groupby(['basket','duration','duration_type'])
# number of unique class
ngroups = g.ngroups()
# generate the uuid
uuids = np.array([str(uuid.uuid4()) for _ in range(ngroups)])
# map the group number to uuid
x['uuid'] = uuids[g.ngroup()]