Python查找所有非重复组合,并获得它们的共同值和唯一值



我有一个集合数组[setA, setB, setC, setD,…], setX],我想找到一种方法来获得每个组合的交集(没有重复组合),所以:

AB = setA.intersect(setB)
AC = setA.intersect(setC)
AD = setA.intersect(setD)
BC = setB.intersect(setC)
BD = setB.intersect(setD)
CD = setC.intersect(setD)
ABC = setA.intersect(setB.intersect(setC))
ABD = setA.intersect(setB.intersect(setD))
ACD = setA.intersect(setC.intersect(setD))
BCD = setB.intersect(setC.intersect(setD))
ABCD = setA.intersect(setB.intersect(setC.intersect(setD)))

我还想获得不同集合中不存在于它们的组合中的唯一值。因此,setA中不属于AB、AC、AD、ABC、ABD、ACD和ABCD的值。不在ABC、ABD和ABCD中的AB值。不在ABCD中的ABC值。等等

我希望最终输出是一个元组列表,其中每个元组看起来像这样:

(combo_name, unique_values, intersected_set)

到目前为止,我一直在手动操作,这很麻烦:

import pandas as pd 
setA_name = 'A'
setB_name = 'B'
setC_name = 'C'
setA = {1,2,3,4,5,6,7,8,9,10}
setB = {2,3,7,11,13,17,23}
setC = {3,6,7,9,10,12,13,15,16}
setA_B = setA.intersection(setB)
setA_C = setA.intersection(setC)
setB_C = setB.intersection(setC)
setA_B_C = setA.intersection(setB.intersection(setC))
setA_B_only = setA_B-setA_B_C
setA_C_only = setA_C-setA_B_C    
setB_C_only = setB_C-setA_B_C
setA_only = setA-setA_B_only-setA_C_only-setA_B_C
setB_only = setB-setA_B_only-setB_C_only-setA_B_C
setC_only = setC-setA_C_only-setB_C_only-setA_B_C
results = [
(setA_name, setA_only, setA),
(setB_name, setB_only, setB),
(setC_name, setC_only, setC),
(';'.join([setA_name, setB_name]), setA_B_only, setA_B),
(';'.join([setA_name, setC_name]), setA_C_only, setA_C),
(';'.join([setB_name, setC_name]), setB_C_only, setB_C),
(';'.join([setA_name, setB_name, setC_name]), setA_B_C, setA_B_C)
]
tab = pd.DataFrame(results)
tab.columns = ['Set', 'Unique', 'Common']
print(tab)
Set        Unique                            Common
0      A  {8, 1, 4, 5}   {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
1      B  {17, 11, 23}         {2, 3, 7, 11, 13, 17, 23}
2      C  {16, 12, 15}  {3, 6, 7, 9, 10, 12, 13, 15, 16}
3    A;B           {2}                         {2, 3, 7}
4    A;C    {9, 10, 6}                  {3, 6, 7, 9, 10}
5    B;C          {13}                        {3, 13, 7}
6  A;B;C        {3, 7}                            {3, 7}

我不知道从哪里开始。

使用@TheEngineerProgrammer的建议更新方法

import pandas as pd
from itertools import combinations as combi
set_dict = {'A': {1,2,3,4,5,6,7,8,9,10}, 'B':{2,3,7,11,13,17,23}, 'C':{3,6,7,9,10,12,13,15,16}}
dict_keys = list(set_dict.keys())
common_dict = set_dict
for i, j in combi(dict_keys,2):
i_set = set_dict.get(i)
j_set = set_dict.get(j)
common = i_set.intersection(j_set)
key_name = ';'.join([i, j])
common_dict[key_name] = common
for i, j, k in combi(dict_keys,3):
i_set = set_dict.get(i)
j_set = set_dict.get(j)
k_set = set_dict.get(k)
common = i_set.intersection(j_set.intersection(k_set))
key_name = ';'.join([i, j, k])
common_dict[key_name] = common

uniq_dict = dict()
for x, y in combi(list(common_dict.keys()),2):
x_split = x.split(';')
if all(item in y for item in x_split):
print(x,'-',y)
if x in uniq_dict:
x_set = uniq_dict.get(x)
else:
x_set = common_dict.get(x)
y_set = common_dict.get(y)
x_uniq = x_set-y_set
print(x_uniq)
uniq_dict[x] = x_uniq
for key in set(common_dict.keys())-set(uniq_dict.keys()):
uniq_dict[key] = common_dict.get(key)
results = []
for key in uniq_dict.keys():
results.append((key, uniq_dict.get(key), common_dict.get(key)))
tab = pd.DataFrame(results, columns = ['Set', 'Unique', 'Common'])
print(tab)
Set        Unique                            Common
0      A  {8, 1, 4, 5}   {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
1      B  {17, 11, 23}         {2, 3, 7, 11, 13, 17, 23}
2      C  {16, 12, 15}  {3, 6, 7, 9, 10, 12, 13, 15, 16}
3    A;B           {2}                         {2, 3, 7}
4    A;C    {9, 10, 6}                  {3, 6, 7, 9, 10}
5    B;C          {13}                        {3, 13, 7}
6  A;B;C        {3, 7}                            {3, 7}

如何根据set_dict中的项数来递增这部分?

for i, j in combi(dict_keys,2):
...
for i, j, k in combi(dict_keys,3):
...
for i, j, k, l in combi(dict_keys,4):
...
for i, j, k, l, m in combi(dict_keys,5):
...

您所描述的需求有点难以理解,特别是部分"不同集合中的唯一值">。我想我现在明白了,但是我无法想象这个集合操作结果在现实世界中的用法,所以我邀请你回顾一下这个需求,如果这确实是我们想要的。

不管怎样,它在下面。使用这种方法,你可以有任意多的集合,而不必担心声明指数数量的变量来操作它们(顺便说一句,这里有一个术语你可能会觉得有用:你是"powerset"上的操作;)

from itertools import combinations
from functools import reduce

sets = {
"A": {1,2,3,4,5,6,7,8,9,10},
"B": {2,3,7,11,13,17,23},
"C": {3,6,7,9,10,12,13,15,16},
}
for r in range(len(sets)):
for combo in combinations(sets, r+1):
name = "".join(combo)
intersection = reduce(set.intersection, (sets[n] for n in combo))
unique = reduce(set.difference, (sets[n] for n in sets if n not in combo), intersection)
print(name, unique, intersection)

输出:

A   {8, 1, 4, 5} {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
B   {17, 11, 23} {17, 2, 3, 23, 7, 11, 13}
C   {16, 12, 15} {3, 6, 7, 9, 10, 12, 13, 15, 16}
AB  {2}          {2, 3, 7}
AC  {9, 10, 6}   {3, 6, 7, 9, 10}
BC  {13}         {3, 13, 7}
ABC {3, 7}       {3, 7}

我想这就是你想要的:

from itertools import combinations
setA = set((2,3,4))
setB = set((2,5,7,4))
setC = set((2,3,1,5,6))
setD = set((8,9,6,7,3))
my_list = [setA, setB, setC, setD]
my_list_names = ["A", "B", "C", "D"]
results = {}
for i in range(2, len(my_list)+1):
for names, sets in zip(combinations(my_list_names, i), combinations(my_list, i)):
name = "".join(names)
results[name] = set.intersection(*sets)
print(results)

你得到这个:

{'AB': {2, 4},
'AC': {2, 3},
'AD': {3},
'BC': {2, 5},
'BD': {7},
'CD': {3, 6},
'ABC': {2},
'ABD': set(),
'ACD': {3},
'BCD': set(),
'ABCD': set()}

相关内容

  • 没有找到相关文章

最新更新