理解在迭代过程中从集合中移除元素的行为



我知道在迭代集合时不应该修改它,但这种行为让我大吃一惊——有人能解释一下为什么它会停止,以及cpython中到底发生了什么吗?我试图disassemblyfoo函数,但没有得到任何答案。我说的是

def foo(first):
d = {0}
for n, k in enumerate(d, first):
d.remove(k)
d.add(n)
print(d)
foo(1)  # prints {8}
foo(6)  # prints {8}
foo(8)  # prints {10} on my macos and {8} on other user machine
foo(9)  # prints {16}

为了探究发生了什么,我添加了一些打印声明:

def foo(first):
d = {0}
for n, k in enumerate(d, first):
print("before update : dict =",d,"pos =",n,"ele =",k,end='t')
d.remove(k)
d.add(n)
print("after update : dict =",d,"pos =",n,"ele =",k)
print(d)
foo(1)  
print("===========")
foo(6)  
print("===========")
foo(8)  
print("===========")
foo(9)  

结果:

before update : dict = {0} pos = 1 ele = 0  after update : dict = {1} pos = 1 ele = 0
before update : dict = {1} pos = 2 ele = 1  after update : dict = {2} pos = 2 ele = 1
before update : dict = {2} pos = 3 ele = 2  after update : dict = {3} pos = 3 ele = 2
before update : dict = {3} pos = 4 ele = 3  after update : dict = {4} pos = 4 ele = 3
before update : dict = {4} pos = 5 ele = 4  after update : dict = {5} pos = 5 ele = 4
before update : dict = {5} pos = 6 ele = 5  after update : dict = {6} pos = 6 ele = 5
before update : dict = {6} pos = 7 ele = 6  after update : dict = {7} pos = 7 ele = 6
before update : dict = {7} pos = 8 ele = 7  after update : dict = {8} pos = 8 ele = 7
{8}
===========
before update : dict = {0} pos = 6 ele = 0  after update : dict = {6} pos = 6 ele = 0
before update : dict = {6} pos = 7 ele = 6  after update : dict = {7} pos = 7 ele = 6
before update : dict = {7} pos = 8 ele = 7  after update : dict = {8} pos = 8 ele = 7
{8}
===========
before update : dict = {0} pos = 8 ele = 0  after update : dict = {8} pos = 8 ele = 0
{8}
===========
before update : dict = {0} pos = 9 ele = 0  after update : dict = {9} pos = 9 ele = 0
before update : dict = {9} pos = 10 ele = 9 after update : dict = {10} pos = 10 ele = 9
before update : dict = {10} pos = 11 ele = 10   after update : dict = {11} pos = 11 ele = 10
before update : dict = {11} pos = 12 ele = 11   after update : dict = {12} pos = 12 ele = 11
before update : dict = {12} pos = 13 ele = 12   after update : dict = {13} pos = 13 ele = 12
before update : dict = {13} pos = 14 ele = 13   after update : dict = {14} pos = 14 ele = 13
before update : dict = {14} pos = 15 ele = 14   after update : dict = {15} pos = 15 ele = 14
before update : dict = {15} pos = 16 ele = 15   after update : dict = {16} pos = 16 ele = 15
{16}

集合中的迭代器必须在集合中有某种指针,只要指针在新元素之后,迭代就会停止。

在我的电脑上玩小集合(Windows上的Python 3.7(,我可以了解元素被散列到集合中的顺序。

>>> {6,7}
{6, 7}
>>> {7,8}
{8, 7}
>>> {8,9}
{8, 9}

注意8是如何在7之前打印出来的,但其他的还在增加。事实上,循环for i in range(64): print({i, i+1})向我表明,至少在我的计算机上,8的倍数总是在它们的前一个和后一个之前进行散列,但其他连续的int对保持递增顺序。

当然,正如已经指出的,所有这些都依赖于特定于实现的细节,但我认为我所建议的原则应该是通用的,只是通过如何有效地在集合上实现迭代。

最新更新