在Python中从字典列表中提取重复项



我有一个巨大的字典列表(为了清晰起见,我在这里缩短了它),其中一些值是重复的(假设'ID'是我的目标)。如何打印ID多次出现的字典/目录?

[{'ID': 2501,
'First Name': 'Edward',
'Last Name': 'Crawford',
'Email': 'c.crawford@randatmail.com',
'Location': '[1.24564352 0.94323637]',
'Registration': '12/12/2000',
'Phone': '398-2890-30'},
{'ID': 3390936,
'First Name': 'Pepe',
'Last Name': 'Slim',
'Email': 'pepe.slim@somemail.com',
'Location': '[1.7297525  0.54631239]',
'Registration': '3/8/2020',
'Phone': '341-3456-85'}]

我只能从字典的列表中打印某些值,但无法解析和识别重复的值。

all_phone = [i['Phone'] for i in comments]
all_email = [i['Email'] for i in comments]

我建议构造一个辅助函数,您可以灵活地选择要查找重复项的字段。结合一个中间字典(比如@Andrej Kesely的回答)是搜索重复项的有效方法,这可以在函数中推广。在本例中,我使用了一个简单的字典,而不是来自collections库的Counter

def find_duplicates(dicts, field):
counts = {}
for d in dicts:
counts[d[field]] = counts.get(d[field], 0) + 1
return [d for d in dicts if counts[d[field]]>1]
phone_duplicates = find_duplicates(comments, 'Phone')

您可以使用collections.Counter来创建一个计数器,其中键将是字典中的IDs。然后,您可以根据以下计数器筛选列表:

lst = [
{
"ID": 2501,
"First Name": "Edward",
"Last Name": "Crawford",
"Email": "c.crawford@randatmail.com",
"Location": "[1.24564352 0.94323637]",
"Registration": "12/12/2000",
"Phone": "398-2890-30",
},
{
"ID": 3390936,
"First Name": "Pepe",
"Last Name": "Slim",
"Email": "pepe.slim@somemail.com",
"Location": "[1.7297525  0.54631239]",
"Registration": "3/8/2020",
"Phone": "341-3456-85",
},
# duplicate ID here:
{
"ID": 2501,
"First Name": "XXX",
"Last Name": "XXX",
},
]
from collections import Counter
# create a counter:
c = Counter(d["ID"] for d in lst)
# print duplicated dictionaries:
for d in lst:
if c[d["ID"]] > 1:
print(d)

打印:

{
"ID": 2501,
"First Name": "Edward",
"Last Name": "Crawford",
"Email": "c.crawford@randatmail.com",
"Location": "[1.24564352 0.94323637]",
"Registration": "12/12/2000",
"Phone": "398-2890-30",
}
{"ID": 2501, "First Name": "XXX", "Last Name": "XXX"}

你可以循环遍历列表并创建一个新的字典,当你遇到重复的

if key not in d:
d[key] = value
else:
# you have a duplicate

使用列表推导式:

comments=[{'ID': 1111,
'First Name': 'foo1',
'Last Name': 'bar1'},
{'ID': 2222,
'First Name': 'foo2',
'Last Name': 'bar2'},
{'ID': 1111,
'First Name': 'foo3',
'Last Name': 'bar3'},
{'ID': 3333,
'First Name': 'foo4',
'Last Name': 'bar4'},
{'ID': 2222,
'First Name': 'foo5',
'Last Name': 'bar5'},]

all_ID = [i['ID'] for i in comments]
Duplicates =list(set([x for x in all_ID if all_ID.count(x) > 1]))

print("Duplicates found! =>", Duplicates )

:

Duplicates found! => [2222, 1111]

相关内容

  • 没有找到相关文章

最新更新