我有以下数据框架:
import pandas as pd
d1 = {'id': ["car", "car", "bus", "plane", "plane"], 'value': [["a","b"], ["b","a"], ["a","b"], ["c","d"], ["d","c"]]}
df1 = pd.DataFrame(data=d1)
df1
id value
0 car [a, b]
1 car [b, a]
2 bus [a, b]
3 plane[c, d]
4 plane[d, c]
我想根据值列表的内容对id进行分组。元素的顺序应该无关紧要。之后,我想根据分组大小对它们进行排序,所以我得到这样的内容:
id value
0 car [a, b]
1 car [b, a]
2 bus [a, b]
id value
0 plane [c, d]
1 plane [d, c]
我尝试使用Counter()将列表转换为字典,然后获得组的大小。然而,我得到以下错误:
import collections
df1["temp"] = list(map(collections.Counter, df1["value"]))
df1 = df1.groupby('temp').size().sort_values(ascending = True)
TypeError: unhashable type: 'Counter'
您可以对列表进行排序以忽略顺序。list
类型是不可哈希的,将它们转换为tuple
,然后可以groupby
。
for _, g in df1.groupby(df1['value'].map(lambda x: tuple(sorted(x)))) :
print(g)
输出:
id value
0 car [a, b]
1 car [b, a]
2 bus [a, b]
id value
3 plane [c, d]
4 plane [d, c]
对value
列进行排序,将其转换为字符串,然后将其用作分组标准:
groups = df.assign(val_str=df['value'].apply(sorted).str.join(',')).groupby('val_str')
for _, g in groups: # separate groups
g = g.drop('val_str', axis=1)
print(g)
id value
0 car [a, b]
1 car [b, a]
2 bus [a, b]
id value
3 plane [c, d]
4 plane [d, c]