如何确保Box::new()真的做堆分配?

我正在尝试测量Box::new()的性能:

fn main() {
let start = Instant::now();
let mut sum = 0;
for _ in 0..100000 {
sum += 42;
}
println!("Simple sum: {:?}", start.elapsed());
let start2 = Instant::now();
for _ in 0..100000 {
let b = Box::new(42);
Box::leak(b);
}
println!("Many heap calls: {:?}", start2.elapsed());
}

我:

Simple sum: 1.413291ms
Many heap calls: 6.9935ms

显然，数据看起来不对。Box::new()的操作一定比+=的5倍重得多。优化从哪里开始?如何禁用它?

确保使用--release运行基准测试代码，否则结果将几乎没有意义。

在你的情况下，如果我用--release运行它，我得到:

Simple sum: 100ns
Many heap calls: 100ns

这意味着编译器完全优化掉了所有东西，因为你的循环没有任何副作用。如果(不考虑所花费的时间)某个操作没有效果，则允许编译器直接删除该操作。

注意，编译器甚至警告:

warning: variable `sum` is assigned to, but never used
--> srcmain.rs:5:13
|
5 |     let mut sum = 0;
|             ^^^
|
= note: consider using `_sum` instead
= note: `#[warn(unused_variables)]` on by default

也就是说，在某些情况下，即使没有副作用，您也希望保留操作，例如基准测试。为此，Rust提供了std::hint::black_box，这是一个函数，它完全返回您给它的内容，但在编译器看来，好像发生了一些奇特的计算，因此编译器无法再证明输入等于输出。这可以防止编译器优化这个函数，以及所有输入它的东西。

在你的例子中，这是一个如何防止Rust优化你的循环的例子:

use std::time::Instant;
fn main() {
let start = Instant::now();
let mut sum = 0;
for _ in 0..100000 {
sum += 42;
std::hint::black_box(sum);
}
println!("Simple sum: {:?}", start.elapsed());
let start2 = Instant::now();
for _ in 0..100000 {
let b = std::hint::black_box(Box::new(42));
std::hint::black_box(Box::leak(b));
}
println!("Many heap calls: {:?}", start2.elapsed());
}

Simple sum: 27.2µs
Many heap calls: 2.8956ms

现在这些数字更有意义了。

要100%确定它没有优化掉任何重要的东西，您总是可以检查反汇编。因为围绕println!()s的asm很难阅读，所以将它们提取到它们自己的函数中是有意义的。

确保将这些函数设置为pub，以使它们在最终的汇编中显示出来，否则它们可能由于内联而消失。

如下所示:

use std::time::Instant;
pub fn simple_sum() {
let mut sum = 0;
for _ in 0..100000 {
sum += 42;
std::hint::black_box(sum);
}
}
pub fn many_heap_calls() {
for _ in 0..100000 {
let b = std::hint::black_box(Box::new(42));
std::hint::black_box(Box::leak(b));
}
}
fn main() {
let start = Instant::now();
simple_sum();
println!("Simple sum: {:?}", start.elapsed());
let start2 = Instant::now();
many_heap_calls();
println!("Many heap calls: {:?}", start2.elapsed());
}

example::simple_sum:
sub     rsp, 4
mov     eax, 42
mov     rcx, rsp
.LBB0_1:
mov     dword ptr [rsp], eax
add     eax, 42
cmp     eax, 4200042
jne     .LBB0_1
add     rsp, 4
ret
example::many_heap_calls:
push    r15
push    r14
push    rbx
sub     rsp, 16
mov     ebx, 100000
mov     r14, qword ptr [rip + __rust_alloc@GOTPCREL]
lea     r15, [rsp + 8]
.LBB1_1:
mov     edi, 4
mov     esi, 4
call    r14
test    rax, rax
je      .LBB1_4
mov     dword ptr [rax], 42
mov     qword ptr [rsp + 8], rax
mov     rax, qword ptr [rsp + 8]
mov     qword ptr [rsp + 8], rax
dec     ebx
jne     .LBB1_1
add     rsp, 16
pop     rbx
pop     r14
pop     r15
ret
.LBB1_4:
mov     edi, 4
mov     esi, 4
call    qword ptr [rip + alloc::alloc::handle_alloc_error@GOTPCREL]
ud2

这里需要注意的重要部分是.LBB0_1:,.LBB1_1:和jne .LBB0_1和jne .LBB1_1，它们是两个for循环。这表明循环没有得到优化。

还要注意mov r14, qword ptr [rip + __rust_alloc@GOTPCREL]和call r14，这是执行堆分配的实际调用。所以这个也没有被优化掉

另外，请注意有趣的cmp eax, 4200042。这张图显示它重制了第一个循环;而不是:

let mut sum = 0;
for _ in 0..100000 {
sum += 42;
}

它优化为

let mut sum = 0;
while sum != 4200042 {
sum += 42;
}

，它实际上给出了相同的结果，并重用sum变量作为循环计数器:)

现在与以前相比:

use std::time::Instant;
pub fn simple_sum() {
let mut sum = 0;
for _ in 0..100000 {
sum += 42;
}
}
pub fn many_heap_calls() {
for _ in 0..100000 {
let b = Box::new(42);
Box::leak(b);
}
}
fn main() {
let start = Instant::now();
simple_sum();
println!("Simple sum: {:?}", start.elapsed());
let start2 = Instant::now();
many_heap_calls();
println!("Many heap calls: {:?}", start2.elapsed());
}

example::simple_sum:
ret
example::many_heap_calls:
ret

我认为这一点不需要进一步解释;)

相关内容

最新更新

热门标签：