在字典列表中查找相同的键/值对,而不使用嵌套循环



我正在做一个非常简单的计算,我在字典列表中找到相同的键/值对,通过求和来组合它们。假设数据是:

编辑:name &id是任意名称例如,我有一个非常大的字典,我使用多个键

输入>
{
"name":"first",
"id":"1234",
"quantity":10
},
{
"name":"first",
"id":"1234",
"quantity":30
},
{
"name":"another",
"id":"0000",
"quantity":10
}

{
"name":"first",
"id":"1234",
"quantity":40
},
{
"name":"another",
"id":"0000",
"quantity":10
}

我很好奇如何做到这一点"python "方法,尽可能避免嵌套循环。

现在我拥有了我不满意的东西:

for entry in quantities:
for compare in quantities:
if id(entry) != id(compare):
if (entry["name"] == compare["name"]) and (entry["id"] == compare["id"]):
entry["quantity"] = entry["quantity"] + compare["quantity"]
quantities.remove(compare)

任何提示/建议都很感激,谢谢!

使用另一个字典并对键进行分组,我指的是"名称";和";id"(虽然,"id"还不够吗?

类似:

grouper = {}
for q in quantities:
key = q['name'], q['id']
if key in grouper:
grouper[key]['quantity'] += q['quantity']
else:
grouper[key] = q.copy()
quantities = list(grouper.values())
在REPL中:

In [1]: quantities = [
...: {
...:   "name":"first",
...:   "id":"1234",
...:   "quantity":10
...: },
...: {
...:   "name":"first",
...:   "id":"1234",
...:   "quantity":30
...: },
...: {
...:   "name":"another",
...:   "id":"0000",
...:   "quantity":10
...: }
...: ]
In [2]: grouper = {}
In [3]: for q in quantities:
...:     key = q['name'], q['id']
...:     if key in grouper:
...:         grouper[key]['quantity'] += q['quantity']
...:     else:
...:         grouper[key] = q.copy()
...:
In [4]: grouper
Out[4]:
{('first', '1234'): {'name': 'first', 'id': '1234', 'quantity': 40},
('another', '0000'): {'name': 'another', 'id': '0000', 'quantity': 10}}

然后你可以直接从这些值中得到你的新列表:

In [5]: list(grouper.values())
Out[5]:
[{'name': 'first', 'id': '1234', 'quantity': 40},
{'name': 'another', 'id': '0000', 'quantity': 10}]

这种方法需要线性时间和线性空间。

注意,q.copy()创建了一个拷贝,这在这里是可以的,但如果您的字典中有可变值,则可能不是这样。

还要注意,您可能需要重新考虑您的数据结构。你真的想要一张清单吗?如果您有一个唯一键,并且希望能够通过该键快速找到对象,则可能需要某种类型的字典。

方法一:使用groupby和reduce

from itertools import groupby
from functools import reduce
def merge(d1, d2):
' merge two dictionaries based upon summing key values not in grouper '
return {k:v if k in grouper else v + d2.get(k, 0) for k, v in d1.items()}
grouper = ("name", "id")  # keys to groupby
lst.sort(key = lambda d:[d[key] for key in grouper])  # Sort list inplace based upon grouper keys
# Done inplace to save space
# Merge dicts in list in same group based upon merge function
outputlist = [(reduce(merge, g)) for _, g in groupby(lst, lambda d:[d[key] for key in grouper])]


[{'name': 'another', 'id': '0000', 'quantity': 10},
{'name': 'first', 'id': '1234', 'quantity': 40}]

方法2——使用Pandas

避免所有循环的一行代码(方法实际上复制了方法1)

outputlist = pd.DataFrame(lst).groupby(['name', 'id']).sum().reset_index().to_dict('records')

outputlist:

[{'name': 'another', 'id': '0000', 'quantity': 10},
{'name': 'first', 'id': '1234', 'quantity': 40}]

解释

pd.DataFrame(lst)            - generate pandas DataFrame from list of dictionaries
groupby(['name', 'id'])      - group rows by name & id
sum()                        - sum the non-grouped values in each group
reset_index()                - reset index back to 0, 1, 2, ...
to_dict('records')           - convert to list of dictionaries 
with each row data as dictionary