Python Dedup/Merge List of Dicts

说我有一个dicts列表：

list = [{'name':'john','age':'28','location':'hawaii','gender':'male'},
        {'name':'john','age':'32','location':'colorado','gender':'male'},
        {'name':'john','age':'32','location':'colorado','gender':'male'},
        {'name':'parker','age':'24','location':'new york','gender':'male'}]

在此规定中，"名称"可以被视为唯一的标识符。我的目标不仅是为相同的dict（即[1]和列表[2]删除此列表，而且还要合并/附加单个"名称"（即列表[0]和列表[1/2]。换句话说，我想在我的示例中合并所有的"名称" ="约翰"命令与单个dict一样，例如：

dedup_list = [{'name':'john','age':'28; 32','location':'hawaii; colorado','gender':'male'},
              {'name':'parker','age':'24','location':'new york','gender':'male'} ]

我迄今已尝试创建第二个列表，dedup_list，并通过第一个列表进行迭代。如果dedup_list的一个dict中尚未存在"名称"密钥，我将附加它。这是我卡住的合并部分。

for dict in list:
    for new_dict in dedup_list:
        if dict['name'] in new_dict:
            # MERGE OTHER DICT FIELDS HERE
        else:
            dedup_list.append(dict) # This will create duplicate values as it iterates through each row of the dedup_list.  I can throw them in a set later to remove?

我的DICS列表永远不会包含100个以上的项目，因此O（n^2）解决方案绝对可以接受，但不一定是理想的。此dedup_list最终将写入CSV，因此，如果有解决方案涉及的解决方案，我都是耳朵。

谢谢！

好吧，我正要围绕defaultdict制定解决方案，但希望@hivert发布了我可以提出的最好的解决方案，这是这个答案：

from collections import defaultdict
dicts = [{'a':1, 'b':2, 'c':3},
         {'a':1, 'd':2, 'c':'foo'},
         {'e':57, 'c':3} ]
super_dict = defaultdict(set)  # uses set to avoid duplicates
for d in dicts:
    for k, v in d.iteritems():
        super_dict[k].add(v)

即。我投票决定将这个问题作为这个问题的欺骗。

n.b。：您不会获得诸如'28; 32'之类的值，而是获得包含[28,32]的集合，然后可以根据需要将其处理到CSV文件中。

n.b.2：要编写CSV文件，请查看dictwriter类

相关内容

最新更新

热门标签：