
我正在阅读斯科特·迈耶斯(Scott Meyers)的有效C++,作者正在按值和按引用进行比较。对于用户定义类型,建议使用按引用传递,对于内置类型,建议使用按值传递。我正在寻找一个示例来解释以下段落,即即使对于小型用户定义的对象,状态按值传递也可能是昂贵的。

内置类型很小,所以有些人得出结论,所有小类型 是按值传递的良好候选项,即使它们是用户定义的。 这是不可靠的推理。仅仅因为物体很小并不意味着 调用其复制构造函数是廉价的。许多对象 — 大多数 其中的 STL 容器 — 只包含一个指针,但 复制这些对象需要复制它们指向的所有内容。那 可能非常昂贵。

这取决于您的副本是深拷贝还是浅拷贝(或类似值的类/类似指针的类)。例如,A 是一个只有一个指向另一个对象的指针的类:

struct B;
struct A
B* pB;
~A{delete pB;}



如果您仍然想按值传递并且不想触发未定义的行为,shared_ptr可能是您的不错选择。但正如@Arne Vogel所指出的,shared_ptr的实现是线程安全的,这需要在引用计数上进行原子操作,这将增加成本。

"成本"只是浪费了 CPU 周期。


#include <iostream>
class simple {
simple() { std::cout << "constructor" << std::endl; }
simple(const simple& copy) { std::cout << "copied" << std::endl; }
~simple() { std::cout << "destructor" << std::endl; }
void addr() const { std::cout << &(*this) << std::endl; }
void simple_ref(const simple& ref) { ref.addr(); }
void simple_val(simple val) { val.addr(); }
int main(int argc, char* argv[])
simple val;      // output: 'constructor'
simple_ref(val); // output: address of val
simple_val(val); // output: 'copied', address of copy made, 'destructor' (the destructor of the copy made)
return 0;
// output: 'destructor' (destructor of 'val')




(这是一篇关于 copy vs ref 的博客文章的内容,来自 Thiago Macieira,https://www.macieira.org/blog/2012/02/the-value-of-passing-by-value/)


在我们进入 ABI 文档并尝试编译代码之前,我们需要定义我们试图解决的问题。一般来说,我试图找到传递小C++结构的最佳方法:什么时候按值传递比按常量引用传递更好?在这种情况下,qreal讨论有什么重要意义吗?

像 QLatin1String 这样的小结构,它只包含一个指针作为成员,将受益于按值传递。我们还应该考虑哪些其他类型的结构?

  • 具有多个指针的结构
  • 64 位体系结构上具有 32 位整数的结构
  • 浮点结构(单精度和双精度)
  • Qt中的混合型和专用结构

我将研究 x86-64、ARMv7 硬浮点、MIPS 硬浮点 (o32) 和 IA-64 ABI,因为它们是我可以访问编译器的那些。它们都支持通过寄存器传递参数,并且在参数传递中使用了至少 4 个整数寄存器。除了MIPS,它们还有至少4个浮点寄存器用于参数传递。有关更多信息,请参阅我之前的 ABI 详细博客。


struct Pointers2
void *p1, *p2;
struct Pointers4
void *p1, *p2, *p3, *p4;
struct Integers2 // like QSize and QPoint
int i1, i2;
struct Integers4 // like QRect
int i1, i2, i3, i4;
template <typename F> struct Floats2 // like QSizeF, QPointF, QVector2D
F f1, f2;
template <typename F> struct Floats3 // like QVector3D
F f1, f2, f3;
template <typename F> struct Floats4 // like QRectF, QVector4D
F f1, f2, f3, f4;
template <typename F> struct Matrix4x4 // like QGenericMatrix<4, 4>
F m[4][4];
struct QChar
unsigned short ucs;
struct QLatin1String
const char *str;
int len;
template <typename F> struct QMatrix
F _m11, _m12, _m21, _m22, _dx, _dy;
template <typename F> struct QMatrix4x4 // like QMatrix4x4
F m[4][4];
int f;


template <typename T> void externalFunction(T);
template <typename T> void passOne()
template <typename T> T externalReturningFunction();
template <typename T> void returnOne()
// C++11 explicit template instantiation
template void passOne<Pointers2>();
template void passOne<Pointers4>();
template void passOne<Integers2>();
template void passOne<Integers4>();
template void passOne<Floats2<float> >();
template void passOne<Floats2<double> >();
template void passOne<Floats3<float> >();
template void passOne<Floats3<double> >();
template void passOne<Floats4<float> >();
template void passOne<Floats4<double> >();
template void passOne<Matrix4x4<float> >();
template void passOne<Matrix4x4<double> >();
template void passOne<QChar>();
template void passOne<QLatin1String>();
template void passOne<QMatrix<float> >();
template void passOne<QMatrix<double> >();
template void passOne<QMatrix4x4<float> >();
template void passOne<QMatrix4x4<double> >();
template void returnOne<Pointers2>();
template void returnOne<Pointers4>();
template void returnOne<Integers2>();
template void returnOne<Integers4>();
template void returnOne<Floats2<float> >();
template void returnOne<Floats2<double> >();
template void returnOne<Floats3<float> >();
template void returnOne<Floats3<double> >();
template void returnOne<Floats4<float> >();
template void returnOne<Floats4<double> >();
template void returnOne<Matrix4x4<float> >();
template void returnOne<Matrix4x4<double> >();
template void returnOne<QChar>();
template void returnOne<QLatin1String>();
template void returnOne<QMatrix<float> >();
template void returnOne<QMatrix<double> >();
template void returnOne<QMatrix4x4<float> >();
template void returnOne<QMatrix4x4<double> >();


void passFloat()
void externalFloat(float, float, float, float);
externalFloat(1.0f, 2.0f, 3.0f, 4.0f);
void passDouble()
void externalDouble(double, double, double, double);
externalDouble(1.0f, 2.0f, 3.0f, 4.0f);
float returnFloat()
return 1.0f;
double returnDouble()
return 1.0;
Analysis of the output

您可能已经注意到我跳过了旧式 32 位 x86。这是故意的,因为该平台无论如何都不支持通过寄存器传递。我们能从中得出的唯一结论是:

whether the structures are stored in the stack in the place of the argument, or whether they’re stored elsewhere and it’s passed by pointer
whether single-precision floating-point is promoted to double-precision

此外,我故意忽略它,因为我希望人们开始考虑用于x86-64的新ILP32 ABI,由GCC 4.7′s -mx32交换机启用,它遵循与下面描述的相同的ABI(除了指针是32位)。


Pointers2 is passed in registers;
Pointers4 is passed in memory;
Integers2 is passed in a single register (two 32-bit values per 64-bit register);
Integers4 is passed in two registers only (two 32-bit values per 64-bit register);
Floats2<float> is passed packed into a single SSE register, no promotion to double
Floats3<float> is passed packed into two SSE registers, no promotion to double;
Floats4<float> is passed packed into two SSE registers, no promotion to double;
Floats2<double> is passed in two SSE registers, one value per register
Floats3<double> and Floats4<double> are passed in memory;
Matrix4x4 and QMatrix4x4 are passed in memory regardless of the underlying type;
QChar is passed in a register;
QLatin1String is passed in registers.
The floating point parameters are passed one per register, without float promotion to double.


Single-precision floating-point types are not promoted to double;
Single-precision floating-point types in a structure are packed into SSE registers if they are still available
Structures bigger than 16 bytes are passed in memory, with an exception for __m256, the type corresponding to one AVX 256-bit register.



Both Pointers structures are passed in registers, one pointer per register;
Both Integers structures are passed in registers, packed like x86-64 (two ints per register);
All of the Floats structures are passed in registers, one value per register (unpacked);
QMatrix4x4<float> is passed entirely in registers: half of it (the first 8 floats) are in floating-point registers, one value per register (unpacked); the other half is passed in integer registers out4 to out7 as the memory representations (packed);
QMatrix4x4<double> is passed partly in registers: half of it (the first 8 doubles) are in floating-point registers, one value per register (unpacked); the other half is passed in memory;
QChar and QLatin1String are passed in registers;
Both QMatrix are passed entirely in registers, one value per register (unpacked);
QMatrix4x4 is passed like Matrix4x4, except that the integer is always in memory (the structure is larger than 8*8 bytes);
Individual floating-point parameters are passed one per register; type promotion happens internally in the register.


The floating-point structures with up to 8 floating-point members are returned in registers;
The integer structures of up to 32 bytes are returned in registers;
All the rest is returned in memory supplied by the caller.


Type promotion happens in hardware, as IA-64 does not have specific registers for single or double precision (is FP registers hold only extended precision data);
Homogeneous structures of floating-point types are passed in registers, up to 8 values; the rest goes to the integer registers if there are some still available or in memory;
All other structures are passed in the integer registers, up to 64 bytes;
Integer registers are allocated for passing any and all types, even if they aren't used (the ABI says they should be used if in the case of C without prototypes).


我只为 ARMv7 编译了代码,浮点参数在 VFP 寄存器中传递。如果您正在阅读此博客,您可能对性能感兴趣,因此您必须对 ARM 使用"硬浮动"模型。我不会关心较慢的"软浮动"模式。另请注意,这只是 ARMv7:ARMv8 64 位 (AArch64) 规则略有不同,但没有可用的编译器。


Pointers2, Pointers4, Integers2, and Integers4 are passed in registers (note that the Pointers and Integers structures are the same in 32-bit mode);
All of the Float types are passed in registers, one value per register, without promotion of floats to doubles; the values are also stored in memory but I can't tell if this is required or just GCC being dumb;
All types of Matrix4x4, QMatrix and QMatrix4x4 are passed in both memory and registers, which contains the first 16 bytes;
QChar and QLatin1String are passed in registers;
are passed in memory regardless of the underlying type.
The floating point parameters are passed one per register, without float promotion to double.


All of the Float types are returned in registers and GCC then stores them all to memory even if they are never used afterwards;
QChar is returned in a register;
Everything else is returned in memory.

请注意,返回类型是 32 位 AAPCS 与 64 位 AAPCS 不同的地方之一:在那里,如果将类型传入寄存器到它是第一个参数的函数,则会在相同的寄存器中返回该类型。32 位 AAPCS 将寄存器返回限制为 4 字节或更小的结构。


Single-precision floating-point types are not promoted to double;
Homogeneous structures (that is, structures containing one single type) of a floating-point type are passed in floating-point registers if the structure has 4 members or fewer;


我尝试了MIPS 32位构建(使用GCC默认的o32 ABI)和MIPS 64位(使用-mabi = o64 -mlong64)。除非另有说明,否则两种体系结构的结果相同。


Both types of Integers and Pointers structures are passed in registers; on 64-bit, two 32-bit integers are packed into a single 64-bit register like x86-64;
Float2<float>, Float3<float>, and Float4<float> are passed in integer registers, not on the floating-point registers; on 64-bit, two floats are packed into a single 64-bit register;
Float2<double> is passed in integer registers; on 32-bit, two 32-bit registers are required to store each double;
On 32-bit, the first two doubles of Float3<double> and Float3<double> are passed in integer registers, the rest are passed in memory;
On 64-bit, Float3<double> and Float3<double> are passed entirely in integer registers;
Matrix4x4, QMatrix, and QMatrix4x4 are passed in integer registers (the portion that fits) and in memory (the rest);
QChar is passed in a register (on MIPS big-endian, it's passed on bits 16-31);
QLatin1String is passed on two registers;
The floating point parameters are passed one per register, without float promotion to double.

对于返回值,MIPS 很简单:所有内容都在内存中返回,甚至是 QChar。


No float is promoted to double;
No structure is ever passed in floating-point registers;
No structure is ever returned in registers.


我们能得出的总体结论很少。其中之一是,当存在形式参数时,单精度浮点值不会显式提升为双精度值。自动提升可能只发生在以省略号 (...) 中传递的浮点值,但我们的问题陈述是关于调用知道参数的函数。唯一与规则略有偏差的是 IA-64,但这并不重要,因为硬件(如 x87)仅在一种模式下运行。



为了继续得出结论,我们需要排除 MIPS,因为它传递整数寄存器中的所有内容并通过内存返回所有内容。如果我们这样做,我们能够看到所有 ABI 都为仅包含一个浮点类型的结构提供了优化。在 ABI 文档中,这些名称略有不同,都意味着同质浮点结构。这些优化意味着结构在特定条件下通过浮点寄存器传递。

第一个分解的实际上是 x86-64:上限为 16 字节,限制为两个 SSE 寄存器。这样做的基本原理似乎是传递一个双精度复数值,这需要 16 个字节。我们能够传递四个单精度值是一个意想不到的好处。

其余架构(ARM 和 IA-64)可以通过寄存器传递更多值,并且始终为每个寄存器传递一个值(无打包)。IA-64 具有更多专用于参数传递的寄存器,因此它可以传递比 ARM 更多的寄存器。 代码建议

Structures of up to 16 bytes containing integers and pointers should be passed by value;
Homogeneous structures of up to 16 bytes containing floating-point should be passed by value (2 doubles or 4 floats);
Mixed-type structures should be avoided; if they exist, passing by value is still a good idea;

以上仅适用于平凡可复制和可拼命的结构。所有 C 结构(C++ 中的 POD)都符合这些标准。 结语

我应该指出,上面的建议并不总是产生更有效的代码。即使这些值可以在寄存器中传递,我测试的每个编译器(GCC 4.6,Clang 3.0,ICC 12.1)在某些情况下仍然会执行大量内存操作。编译器将结构写入内存,然后将其加载到寄存器中是很常见的。当它这样做时,通过常量引用传递会更有效,因为它会用堆栈指针上的算术替换内存负载。

但是,这些只是编译器团队进一步优化工作的问题。我为 x86-64 测试的三个编译器的优化方式不同,在几乎所有情况下,其中至少有一个设法在没有内存访问的情况下完成。有趣的是,当我们用零替换填充空间时,行为也会发生变化。


