GroupBy结果到列表字典(具有多列)



我正在尝试实现类似的功能:GroupBy结果到列表字典。

Column1 Column2 Column3
0       23      1
1       5       2
1       2       3
1       19      5
2       56      1
2       22      2
3       2       4
3       14      5
4       59      1
5       44      1
5       1       2
5       87      3
sdf.groupby('Column1')['Column3'].apply(list).to_dict() 

工作非常完美。

然而,我需要获得多列元组的列表,比如:

sdf.groupby('Column1')['Column2', 'Column3'].apply(list).to_dict() 

以获得类似的输出

{0: [(23, 1)],
1: [(5,2), (2,3), (19,5)],
...}

其返回报头而不是值。

以下是我的变通解决方案(在我看来,要得到这个结果需要做太多的工作(:

def get_dict_of_set_from_df(df: pd.DataFrame, key_cols: list, val_cols: list) -> dict:
"""
Generic method to create Dict[key_cols] = set(val_cols)
:param df:
:param key_cols:
:param val_cols:
:return:
"""
# df.groupby(key_cols)[val_cols].apply(set).to_dict()
cols = key_cols + val_cols
len_key = len(key_cols)
len_val = len(val_cols)
# get all relevant columns (key_cols and val_cols) from the dataframe
l_ = df[cols].values.tolist()
dc = defaultdict(set)
for c in l_:
# if key or val is a singleton, then do not put into tuple
k = tuple(c[:len_key]) if len_key > 1 else c[:len_key][0]
v = tuple(c[len_key:]) if len_val > 1 else c[len_key:][0]
dc[k].add(v)
return dc

你可以做:

import pandas as pd
data = [[0, 23, 1],
[1, 5, 2],
[1, 2, 3],
[1, 19, 5],
[2, 56, 1],
[2, 22, 2],
[3, 2, 4],
[3, 14, 5],
[4, 59, 1],
[5, 44, 1],
[5, 1, 2],
[5, 87, 3]]
df = pd.DataFrame(data=data, columns=['c1', 'c2', 'c3'])

def to_list(x):
return list(zip(x.c2, x.c3))

groups = df.groupby('c1')[['c2', 'c3']].apply(to_list)
result = {k: values for k, values in zip(groups.index, groups)}
print(result)

输出

{0: [(23, 1)], 1: [(5, 2), (2, 3), (19, 5)], 2: [(56, 1), (22, 2)], 3: [(2, 4), (14, 5)], 4: [(59, 1)], 5: [(44, 1), (1, 2), (87, 3)]}

最新更新