假设我有一个包含以下列表的数组:
data = [['a', 'b', 'c'],['a', 'b'],['c']]
根据它们在列表中的数量来计算每一对出现的最佳解决方案是什么?
。结果应为:
member_one_is member_two_is COUNT
a b 2
a c 1
b c 1
使用collections.Counter
和itertools.combinations
的一种方法:
from collections import Counter
from itertools import combinations
import pandas as pd
data = [['a', 'b', 'c'], ['a', 'b'], ['c']]
# get the counts using collections Counter and the combinations using combinations
# make sure each sub-list is sorted with sorted
counts = Counter(combination for lst in map(sorted, data) for combination in combinations(lst, 2))
# create the DataFrame
df = pd.DataFrame(data=[[*k, v] for k, v in counts.items()], columns=["member_one_is", "member_two_is", "COUNT"])
print(df)
member_one_is member_two_is COUNT
0 a b 2
1 a c 1
2 b c 1
注意,如果列表是排序的,您可以跳过map(sorted, data)
并直接迭代data
。