图形缩减/巧妙的空间泄漏示例



我想知道我在这一页上读到的一个空间泄漏的例子(遗憾的是,那里没有解释):https://en.wikibooks.org/wiki/Haskell/Graph_reduction

狡猾的空间泄漏示例:

(\xs->头xs+尾xs)[1.n]

(\xs->最后一个xs+头xs)[1.n]

第一个版本在O(1)空间上运行。O(n)中的第二个。

我不确定我是否理解正确(希望你能帮忙)。据我所知,懒惰评价意味着最左边的评价。因此,在这些例子中,我们会减少这些redex,比如:

(\xs->头xs+最后xs)[1..200]

=>([1..200]->头xs+最后xs)

=>头[1.200]+最后[1.200]

=>1+最后[1..200]

=>1+最后[2..200]

=>。。

=>1+最后[199..200]

=>1+最后[200]

=>1+200

=>201

(\xs->最后一个xs+头xs)[1..200]

([1..200]->最后xs+头xs)

最后[1.200]+头[1.200]

最后[2.200]+头[1.200]

最后[199.200]+头[1.200]

最后[200]+头[1.200]

200+头[1..200]

200+1

201

也许我用错误的方式减少了它(如果我错了,请纠正我),但在这里我看不到可能的空间泄漏。因此,我首先用ghci:测试了运行时(而不是空间)

1>> (xs -> head xs + last xs) [1..50000000]
50000001
(10.05 secs, 4,003,086,632 bytes)
2>> (xs -> last xs + head xs) [1..50000000]
50000001
(2.26 secs, 3,927,748,176 bytes)

根据wikibooks的说法,第二个版本中应该存在空间泄漏,但运行时要快得多(这可能是可能的,这里没有什么奇怪的)。

我有以下源代码:

module Main where
main = do
--  let a = (xs -> head xs + last xs) [1..20000000]    -- space leak
let b = (xs -> last xs + head xs) [1..20000000]    -- no space leak
--  putStrLn ("result for a - (\xs -> head xs + last xs): " ++ show a)
putStrLn ("result for b - (\xs -> last xs + head xs): " ++ show b)

我在没有优化的情况下编译了它,然后我调用了这个程序:

$ ghc -O0 --make -rtsopts -fforce-recomp Main.hs
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
$ ./Main +RTS -sstderr
result for b - (xs -> last xs + head xs): 20000001
1,600,057,352 bytes allocated in the heap
211,000 bytes copied during GC
44,312 bytes maximum residency (2 sample(s))
21,224 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed)  Avg pause  Max pause
Gen  0      3062 colls,     0 par    0.012s   0.012s     0.0000s    0.0001s
Gen  1         2 colls,     0 par    0.000s   0.000s     0.0002s    0.0002s
INIT    time    0.000s  (  0.000s elapsed)
MUT     time    0.518s  (  0.519s elapsed)
GC      time    0.012s  (  0.012s elapsed)
EXIT    time    0.000s  (  0.000s elapsed)
Total   time    0.534s  (  0.532s elapsed)
%GC     time       2.3%  (2.3% elapsed)
Alloc rate    3,088,101,743 bytes per MUT second
Productivity  97.6% of total user, 98.0% of total elapsed

这是一个不错的结果,我们有2.3%的垃圾回收,使用的内存大约为1MB。然后我编译了另一个没有优化的案例,得到了以下结果:

module Main where
main = do
let a = (xs -> head xs + last xs) [1..20000000]    -- space leak
--  let b = (xs -> last xs + head xs) [1..20000000]    -- no space leak
putStrLn ("result for a - (\xs -> head xs + last xs): " ++ show a)
--  putStrLn ("result for b - (\xs -> last xs + head xs): " ++ show b)

$ ghc -O0 --make -rtsopts -fforce-recomp Main.hs
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
$ ./Main +RTS -sstderr
result for a - (xs -> head xs + last xs): 20000001
1,600,057,352 bytes allocated in the heap
2,088,615,552 bytes copied during GC
540,017,504 bytes maximum residency (13 sample(s))
135,620,768 bytes maximum slop
1225 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed)  Avg pause  Max pause
Gen  0      3051 colls,     0 par    0.911s   0.915s     0.0003s    0.0016s
Gen  1        13 colls,     0 par    2.357s   2.375s     0.1827s    0.9545s
INIT    time    0.000s  (  0.000s elapsed)
MUT     time    0.434s  (  0.430s elapsed)
GC      time    3.268s  (  3.290s elapsed)
EXIT    time    0.094s  (  0.099s elapsed)
Total   time    3.799s  (  3.820s elapsed)
%GC     time      86.0%  (86.1% elapsed)
Alloc rate    3,687,222,801 bytes per MUT second
Productivity  14.0% of total user, 13.9% of total elapsed

更糟糕的是,有很多垃圾收集正在进行,内存使用率也高得多。

这里到底发生了什么?我不明白为什么会有太空泄漏。


p.S.如果你用完全优化(O2)编译这两种情况,那么两个程序的效率几乎相等。

xs是相同的。当你把[1..200]写两次时,你就把自己弄糊涂了。在表达式的评估过程中,最好明确命名所有临时实体:

(xs -> head xs + last xs) [1,2,3]
head xs + last xs        { xs = [1,2,3] }
(+) (head xs) (last xs)
(+) 1         (last xs)  { xs = 1:xs1 ; xs1 = [2,3] }
(last xs1) { xs1 = 2:xs2 ; xs2 = [3] }
(last xs2)
3
4

在这里,我们可以看到,随着我们的前进,xs(和xs1)的绑定可以被安全地忘记,因为任何地方都不再引用xs

所以你确实正确地减少了它,除了第二个[1..200]和第一个相同,所以我们必须抓住它,同时首先找到它的last(在第二个变体中),因此发生了泄漏。

当然,编译器可以对此进行优化,因为引用透明原则,可以用equals代替equals,并执行两次枚举[1..200],因此也可以在O(1)空间中运行第二个变体。

所以最后,它是一个编译器的东西。空间泄漏可能发生(而不是应该)。

最新更新