假设我有一个pandas数据框架:
| id1 | id2 | attr1 | combo_id | perm_id |
| --- | --- | --- | --- | --- |
| 1 | 2 | [9606] | [1,2] | AB |
| 2 | 1 | [9606] | [1,2] | BA |
| 3 | 4 | [9606] | [3,4] | AB |
| 4 | 3 | [9606] | [3,4] | BA |
我想将具有相同combo_id的行聚合在一起,并使用该行的perm_id存储来自两行的信息。因此,生成的数据框看起来像:
| attr1 | combo_id |
| --- | --- |
| {'AB':[9606], 'BA': [9606]} | [1,2] |
| {'AB':[9606], 'BA': [9606]} | [3,4] |
如何使用groupby和aggregate函数进行这些操作?
我尝试使用perm_id将attribute1转换为字典。
df['attr1'] = df.apply(lambda x: {x['perm_id']: x['attr1']})
然后我打算用一些东西来组合同一组中的字典。df.groupby(['combo_id']).agg({ 'attr1': lambda x: {x**})
但这导致KeyError: perm_id
有什么建议吗?
尝试:
from ast import literal_eval
x = (
df.groupby(df["combo_id"].astype(str))
.apply(lambda x: dict(zip(x["perm_id"], x["attr1"])))
.reset_index(name="attr1")
)
# convert combo_id back to list (if needed)
x["combo_id"] = x["combo_id"].apply(literal_eval)
print(x)
打印:
combo_id attr1
0 [1, 2] {'AB': [9606], 'BA': [9606]}
1 [3, 4] {'AB': [9606], 'BA': [9606]}