合并保留旧键和新值的字典



我正在编写一个解析RSS提要的Python脚本。我想维护一个定期更新的提要条目字典。提要中不再存在的条目应该被删除,新条目应该获得默认值,而以前看到的条目的值应该保持不变。

我认为这最好用例子来解释:

>>> old = {
...     'a': 1,
...     'b': 2,
...     'c': 3
... }
>>> new = {
...     'c': 'x',
...     'd': 'y',
...     'e': 'z'
... }
>>> out = some_function(old, new)
>>> out
{'c': 3, 'd': 'y', 'e': 'z'}

这是我目前的尝试:

def merge_preserving_old_values_and_new_keys(old, new):
       out = {}
       for k, v in new.items():
           out[k] = v
       for k, v in old.items():
           if k in out:
               out[k] = v
       return out

这是有效的,但在我看来可能有一个更好或更聪明的方法。

编辑:如果你想测试你的功能:

def my_merge(old, new):
    pass
old = {'a': 1, 'b': 2, 'c': 3}
new = {'c': 'x', 'd': 'y', 'e': 'z'}
out = my_merge(old, new)
assert out == {'c': 3, 'd': 'y', 'e': 'z'}

编辑2:将Martijn Pieters的答案定义为set_merge, bravosierra99的答案定义为loop_merge,我的第一次尝试定义为orig_merge,我得到以下计时结果:

>>> setup="""
... old = {'a': 1, 'b': 2, 'c': 3}
... new = {'c': 'x', 'd': 'y', 'e': 'z'}
... from __main__ import set_merge, loop_merge, orig_merge
... """
>>> timeit.timeit('set_merge(old, new)', setup=setup)
3.4415210600000137
>>> timeit.timeit('loop_merge(old, new)', setup=setup)
1.161155690000669
>>> timeit.timeit('orig_merge(old, new)', setup=setup)
1.1776735319999716

我觉得这很奇怪,因为我没想到字典视图的方法会这么慢。

字典有字典视图对象作为集合。使用这些来获得新旧之间的交集:

def merge_preserving_old_values_and_new_keys(old, new):
    result = new.copy()
    result.update((k, old[k]) for k in old.viewkeys() & new.viewkeys())
    return result

上面使用Python 2语法;如果您正在使用Python 3,请使用old.keys() & new.keys(),以获得相同的结果:

def merge_preserving_old_values_and_new_keys(old, new):
    # Python 3 version
    result = new.copy()
    result.update((k, old[k]) for k in old.keys() & new.keys())
    return result

上面的代码将new中的所有键值对作为起点,然后将old中出现的任何键的值相加。

演示:

>>> merge_preserving_old_values_and_new_keys(old, new)
{'c': 3, 'e': 'z', 'd': 'y'}

注意,这个函数和你的版本一样,生成了一个新的字典(尽管键和值对象是共享的;这是一个浅拷贝)。

如果你不需要新字典做其他事情,你也可以就地更新新字典:

def merge_preserving_old_values_and_new_keys(old, new):
    new.update((k, old[k]) for k in old.viewkeys() & new.viewkeys())
    return new

您还可以使用单行字典推导来构建新字典:

def merge_preserving_old_values_and_new_keys(old, new):
    return {k: old[k] if k in old else v for k, v in new.items()}

这应该更有效,因为您不再遍历整个old.items()。此外,由于您没有覆盖某些值,因此这样做的目的更清楚。

for k, v in new.items():
    if k in old.keys():
      out[k] = old[k]
    else:
      out[k] = v
return out
old = {
    'a': 1,
    'b': 2,
    'c': 3
}
new = {
    'c': 'x',
    'd': 'y',
    'e': 'z'
}
def merge_preserving_old_values_and_new_keys(o, n):
    out = {}
    for k in n:
        if k in o:
            out[k] = o[k]
        else:
            out[k] = n[k]
    return out
print merge_preserving_old_values_and_new_keys(old, new)

在讨论中添加这些信息并不是100%的最佳方式:如果有必要,请随意编辑/重新分发。

下面是这里讨论的所有方法的计时结果。

from timeit import timeit
def loop_merge(old, new):
    out = {}
    for k, v in new.items():
        if k in old:
            out[k] = old[k]
        else:
                out[k] = v
    return out
def set_merge(old, new):
    out = new.copy()
    out.update((k, old[k]) for k in old.keys() & new.keys())
    return out
def comp_merge(old, new):
    return {k: old[k] if k in old else v for k, v in new.items()}
def orig_merge(old, new):
    out = {}
    for k, v in new.items():
        out[k] = v
    for k, v in old.items():
        if k in out:
            out[k] = v
    return out

old = {'a': 1, 'b': 2, 'c': 3}
new = {'c': 'x', 'd': 'y', 'e': 'z'}
out = {'c': 3, 'd': 'y', 'e': 'z'}
assert loop_merge(old, new) == out
assert set_merge(old, new) == out
assert comp_merge(old, new) == out
assert orig_merge(old, new) == out
setup = """
from __main__ import old, new, loop_merge, set_merge, comp_merge, orig_merge
"""
for a in ['loop', 'set', 'comp', 'orig']:
    time = timeit('{}_merge(old, new)'.format(a), setup=setup)
    print('{}: {}'.format(a, time))
size = 10**4
large_old = {i: 'old' for i in range(size)}
large_new = {i: 'new' for i in range(size//2, size)}
setup = """
from __main__ import large_old, large_new, loop_merge, set_merge, comp_merge, orig_merge
"""
for a in ['loop', 'set', 'comp', 'orig']:
    time = timeit('{}_merge(large_old, large_new)'.format(a), setup=setup)
    print('{}: {}'.format(a, time))

胜出的是改进的循环方法!

$ python3 merge.py
loop: 0.7791572390015062  # small dictionaries
set: 3.1920828100010112
comp: 1.1180207730030816
orig: 1.1681104259987478
loop: 927.2149353210007  # large dictionaries
set: 1696.8342713210004
comp: 902.039078668
orig: 1373.0389542560006

我很失望,因为字典视图/集合操作方法要酷得多。

对于更大的字典(10^4个条目),字典理解方法领先于改进的循环方法,并且远远领先于原始方法。

最新更新