我正在编写一个解析RSS提要的Python脚本。我想维护一个定期更新的提要条目字典。提要中不再存在的条目应该被删除,新条目应该获得默认值,而以前看到的条目的值应该保持不变。
我认为这最好用例子来解释:
>>> old = {
... 'a': 1,
... 'b': 2,
... 'c': 3
... }
>>> new = {
... 'c': 'x',
... 'd': 'y',
... 'e': 'z'
... }
>>> out = some_function(old, new)
>>> out
{'c': 3, 'd': 'y', 'e': 'z'}
这是我目前的尝试:
def merge_preserving_old_values_and_new_keys(old, new):
out = {}
for k, v in new.items():
out[k] = v
for k, v in old.items():
if k in out:
out[k] = v
return out
这是有效的,但在我看来可能有一个更好或更聪明的方法。
编辑:如果你想测试你的功能:
def my_merge(old, new):
pass
old = {'a': 1, 'b': 2, 'c': 3}
new = {'c': 'x', 'd': 'y', 'e': 'z'}
out = my_merge(old, new)
assert out == {'c': 3, 'd': 'y', 'e': 'z'}
编辑2:将Martijn Pieters的答案定义为set_merge
, bravosierra99的答案定义为loop_merge
,我的第一次尝试定义为orig_merge
,我得到以下计时结果:
>>> setup="""
... old = {'a': 1, 'b': 2, 'c': 3}
... new = {'c': 'x', 'd': 'y', 'e': 'z'}
... from __main__ import set_merge, loop_merge, orig_merge
... """
>>> timeit.timeit('set_merge(old, new)', setup=setup)
3.4415210600000137
>>> timeit.timeit('loop_merge(old, new)', setup=setup)
1.161155690000669
>>> timeit.timeit('orig_merge(old, new)', setup=setup)
1.1776735319999716
我觉得这很奇怪,因为我没想到字典视图的方法会这么慢。
字典有字典视图对象作为集合。使用这些来获得新旧之间的交集:
def merge_preserving_old_values_and_new_keys(old, new):
result = new.copy()
result.update((k, old[k]) for k in old.viewkeys() & new.viewkeys())
return result
上面使用Python 2语法;如果您正在使用Python 3,请使用old.keys() & new.keys()
,以获得相同的结果:
def merge_preserving_old_values_and_new_keys(old, new):
# Python 3 version
result = new.copy()
result.update((k, old[k]) for k in old.keys() & new.keys())
return result
上面的代码将new
中的所有键值对作为起点,然后将old
中出现的任何键的值相加。
>>> merge_preserving_old_values_and_new_keys(old, new)
{'c': 3, 'e': 'z', 'd': 'y'}
注意,这个函数和你的版本一样,生成了一个新的字典(尽管键和值对象是共享的;这是一个浅拷贝)。
如果你不需要新字典做其他事情,你也可以就地更新新字典:
def merge_preserving_old_values_and_new_keys(old, new):
new.update((k, old[k]) for k in old.viewkeys() & new.viewkeys())
return new
您还可以使用单行字典推导来构建新字典:
def merge_preserving_old_values_and_new_keys(old, new):
return {k: old[k] if k in old else v for k, v in new.items()}
这应该更有效,因为您不再遍历整个old.items()。此外,由于您没有覆盖某些值,因此这样做的目的更清楚。
for k, v in new.items():
if k in old.keys():
out[k] = old[k]
else:
out[k] = v
return out
old = {
'a': 1,
'b': 2,
'c': 3
}
new = {
'c': 'x',
'd': 'y',
'e': 'z'
}
def merge_preserving_old_values_and_new_keys(o, n):
out = {}
for k in n:
if k in o:
out[k] = o[k]
else:
out[k] = n[k]
return out
print merge_preserving_old_values_and_new_keys(old, new)
在讨论中添加这些信息并不是100%的最佳方式:如果有必要,请随意编辑/重新分发。
下面是这里讨论的所有方法的计时结果。
from timeit import timeit
def loop_merge(old, new):
out = {}
for k, v in new.items():
if k in old:
out[k] = old[k]
else:
out[k] = v
return out
def set_merge(old, new):
out = new.copy()
out.update((k, old[k]) for k in old.keys() & new.keys())
return out
def comp_merge(old, new):
return {k: old[k] if k in old else v for k, v in new.items()}
def orig_merge(old, new):
out = {}
for k, v in new.items():
out[k] = v
for k, v in old.items():
if k in out:
out[k] = v
return out
old = {'a': 1, 'b': 2, 'c': 3}
new = {'c': 'x', 'd': 'y', 'e': 'z'}
out = {'c': 3, 'd': 'y', 'e': 'z'}
assert loop_merge(old, new) == out
assert set_merge(old, new) == out
assert comp_merge(old, new) == out
assert orig_merge(old, new) == out
setup = """
from __main__ import old, new, loop_merge, set_merge, comp_merge, orig_merge
"""
for a in ['loop', 'set', 'comp', 'orig']:
time = timeit('{}_merge(old, new)'.format(a), setup=setup)
print('{}: {}'.format(a, time))
size = 10**4
large_old = {i: 'old' for i in range(size)}
large_new = {i: 'new' for i in range(size//2, size)}
setup = """
from __main__ import large_old, large_new, loop_merge, set_merge, comp_merge, orig_merge
"""
for a in ['loop', 'set', 'comp', 'orig']:
time = timeit('{}_merge(large_old, large_new)'.format(a), setup=setup)
print('{}: {}'.format(a, time))
胜出的是改进的循环方法!
$ python3 merge.py
loop: 0.7791572390015062 # small dictionaries
set: 3.1920828100010112
comp: 1.1180207730030816
orig: 1.1681104259987478
loop: 927.2149353210007 # large dictionaries
set: 1696.8342713210004
comp: 902.039078668
orig: 1373.0389542560006
我很失望,因为字典视图/集合操作方法要酷得多。
对于更大的字典(10^4个条目),字典理解方法领先于改进的循环方法,并且远远领先于原始方法。