python从不均匀分布的list的list中删除重复项

我有一个python列表，我想合并所有包含至少一个公共元素的列表，并删除类似的项

我有一个大的数据集，它是一个列表的列表，在一些包含列表中有一些共同的数据，我想合并所有具有共同数据的列表

# sample data
foo = [
[0,1,2,6,9],
[0,1,2,6,5],
[3,4,7,3,2],
[12,36,28,73],
[537],
[78,90,34,72,0],
[573,73],
[99],
[41,44,79],
]
# i want to get this
[
[0,1,2,6,9,5,3,4,7,3,2,78,90,34,72,0],
[12,36,28,73,573,73,573],
[99],
[41,44,79],
]

包含一个公共元素的元素它们被组合在一起

原始数据文件是这个

<标题>

编辑this is what I am try

import json
data = json.load(open('x.json')) # https://files.catbox.moe/y1yt5w.json

class Relations:
def __init__(self):
pass
def process_relation(self, flat_data):
relation_keys = []
rel = {}
for i in range(len(flat_data)):
rel[i] = []
for n in flat_data:
if i in n:
rel[i].extend(n)
return {k:list(set(v)) for k,v in rel.items()}
def process(self, flat_data):
rawRelations = self.process_relation(flat_data)
return rawRelations
rel = Relations()
print(json.dumps(rel.process(data), indent=4), file=open('out.json', 'w')) # https://files.catbox.moe/n65tie.json

注意-数据中存在的最大数字将等于列表的列表的长度

修改输入数据的简单(可能不是最优的)算法:

target_idx = 0
while target_idx < len(data):
src_idx = target_idx + 1
did_merge = False
while src_idx < len(data):
if set(data[target_idx]) & set(data[src_idx]):
data[target_idx].extend(data[src_idx])
data.pop(src_idx)  # this was merged
did_merge = True
continue  # with same src_idx
src_idx += 1
if not did_merge:
target_idx += 1

相关内容

最新更新

热门标签：