支持按值传递语句的示例不是很好的做法，即使对于小型用户定义类型也是如此

我正在阅读斯科特·迈耶斯(Scott Meyers)的有效C++，作者正在按值和按引用进行比较。对于用户定义类型，建议使用按引用传递，对于内置类型，建议使用按值传递。我正在寻找一个示例来解释以下段落，即即使对于小型用户定义的对象，状态按值传递也可能是昂贵的。

内置类型很小，所以有些人得出结论，所有小类型是按值传递的良好候选项，即使它们是用户定义的。这是不可靠的推理。仅仅因为物体很小并不意味着调用其复制构造函数是廉价的。许多对象 — 大多数其中的 STL 容器 — 只包含一个指针，但复制这些对象需要复制它们指向的所有内容。那可能非常昂贵。

这取决于您的副本是深拷贝还是浅拷贝(或类似值的类/类似指针的类)。例如，A 是一个只有一个指向另一个对象的指针的类：

struct B;
struct A
{
B* pB;
~A{delete pB;}
}a1,a2;

如果按值复制A，如a1=a2，将调用默认的按位复制赋值，这成本很小，但是，通过这样做，您将让pBa1，a2指向相同的堆内存。也就是说，dtor~A()可以调用两次，这是未定义的行为。

所以我们必须这样做：

struct A
{ 
B* pB;
const A& operator=(const A&rhs)
{
if(this!=&rhs)
{
delete pB;
pB=new pB;
*pB=*rhs.pB;
}
return *this;
}
//the copy/move constructor/assignment should also be redefined
~A{delete pB;}
}a1,a2

上面的代码片段将调用B的复制赋值，这可能非常昂贵。

综上所述，如果你的类是微不足道的可复制的，那么复制一个小的用户定义类，或者按值传递，成本不高，否则取决于。

如果您仍然想按值传递并且不想触发未定义的行为，shared_ptr可能是您的不错选择。但正如@Arne Vogel所指出的，shared_ptr的实现是线程安全的，这需要在引用计数上进行原子操作，这将增加成本。

"成本"只是浪费了 CPU 周期。

例如，举一个简单的例子：

#include <iostream>
class simple {
public:
simple() { std::cout << "constructor" << std::endl; }
simple(const simple& copy) { std::cout << "copied" << std::endl; }
~simple() { std::cout << "destructor" << std::endl; }
void addr() const { std::cout << &(*this) << std::endl; }
};
void simple_ref(const simple& ref) { ref.addr(); }
void simple_val(simple val) { val.addr(); }
int main(int argc, char* argv[])
{
simple val;      // output: 'constructor'
simple_ref(val); // output: address of val
simple_val(val); // output: 'copied', address of copy made, 'destructor' (the destructor of the copy made)
return 0;
// output: 'destructor' (destructor of 'val')
}

在这里，没有成员数据，因此在我的机器上给出sizeof(simple)的输出会给我1，但是调用一个按值而不是按引用的函数会调用副本，即使对于打印变量地址这样简单的事情。

这是一个设计考虑因素，因为它可能是您想要的东西，但像这样复制内存是昂贵的，并且可能完全没有必要，尤其是在上面这样的示例中。

我希望这能有所帮助。

(这是一篇关于 copy vs ref 的博客文章的内容，来自 Thiago Macieira，https://www.macieira.org/blog/2012/02/the-value-of-passing-by-value/)

问题陈述

在我们进入 ABI 文档并尝试编译代码之前，我们需要定义我们试图解决的问题。一般来说，我试图找到传递小C++结构的最佳方法：什么时候按值传递比按常量引用传递更好？在这种情况下，qreal讨论有什么重要意义吗？

像 QLatin1String 这样的小结构，它只包含一个指针作为成员，将受益于按值传递。我们还应该考虑哪些其他类型的结构？

具有多个指针的结构
64 位体系结构上具有 32 位整数的结构
浮点结构(单精度和双精度)
Qt中的混合型和专用结构

我将研究 x86-64、ARMv7 硬浮点、MIPS 硬浮点 (o32) 和 IA-64 ABI，因为它们是我可以访问编译器的那些。它们都支持通过寄存器传递参数，并且在参数传递中使用了至少 4 个整数寄存器。除了MIPS，它们还有至少4个浮点寄存器用于参数传递。有关更多信息，请参阅我之前的 ABI 详细博客。

因此，我们将调查当您按值传递以下结构时会发生什么：

struct Pointers2
{
void *p1, *p2;
};
struct Pointers4
{
void *p1, *p2, *p3, *p4;
};
struct Integers2 // like QSize and QPoint
{
int i1, i2;
};
struct Integers4 // like QRect
{
int i1, i2, i3, i4;
};
template <typename F> struct Floats2 // like QSizeF, QPointF, QVector2D
{
F f1, f2;
};
template <typename F> struct Floats3 // like QVector3D
{
F f1, f2, f3;
};
template <typename F> struct Floats4 // like QRectF, QVector4D
{
F f1, f2, f3, f4;
};
template <typename F> struct Matrix4x4 // like QGenericMatrix<4, 4>
{
F m[4][4];
};
struct QChar
{
unsigned short ucs;
};
struct QLatin1String
{
const char *str;
int len;
};
template <typename F> struct QMatrix
{
F _m11, _m12, _m21, _m22, _dx, _dy;
};
template <typename F> struct QMatrix4x4 // like QMatrix4x4
{
F m[4][4];
int f;
};

我们将分析以下程序的组装：

template <typename T> void externalFunction(T);
template <typename T> void passOne()
{
externalFunction(T());
}
template <typename T> T externalReturningFunction();
template <typename T> void returnOne()
{
externalReturningFunction<T>();
}
// C++11 explicit template instantiation
template void passOne<Pointers2>();
template void passOne<Pointers4>();
template void passOne<Integers2>();
template void passOne<Integers4>();
template void passOne<Floats2<float> >();
template void passOne<Floats2<double> >();
template void passOne<Floats3<float> >();
template void passOne<Floats3<double> >();
template void passOne<Floats4<float> >();
template void passOne<Floats4<double> >();
template void passOne<Matrix4x4<float> >();
template void passOne<Matrix4x4<double> >();
template void passOne<QChar>();
template void passOne<QLatin1String>();
template void passOne<QMatrix<float> >();
template void passOne<QMatrix<double> >();
template void passOne<QMatrix4x4<float> >();
template void passOne<QMatrix4x4<double> >();
template void returnOne<Pointers2>();
template void returnOne<Pointers4>();
template void returnOne<Integers2>();
template void returnOne<Integers4>();
template void returnOne<Floats2<float> >();
template void returnOne<Floats2<double> >();
template void returnOne<Floats3<float> >();
template void returnOne<Floats3<double> >();
template void returnOne<Floats4<float> >();
template void returnOne<Floats4<double> >();
template void returnOne<Matrix4x4<float> >();
template void returnOne<Matrix4x4<double> >();
template void returnOne<QChar>();
template void returnOne<QLatin1String>();
template void returnOne<QMatrix<float> >();
template void returnOne<QMatrix<double> >();
template void returnOne<QMatrix4x4<float> >();
template void returnOne<QMatrix4x4<double> >();

此外，我们对非结构浮点参数会发生什么感兴趣：它们是否被提升？因此，我们还将测试以下内容：

void passFloat()
{
void externalFloat(float, float, float, float);
externalFloat(1.0f, 2.0f, 3.0f, 4.0f);
}
void passDouble()
{
void externalDouble(double, double, double, double);
externalDouble(1.0f, 2.0f, 3.0f, 4.0f);
}
float returnFloat()
{
return 1.0f;
}
double returnDouble()
{
return 1.0;
}
Analysis of the output
x86-64

您可能已经注意到我跳过了旧式 32 位 x86。这是故意的，因为该平台无论如何都不支持通过寄存器传递。我们能从中得出的唯一结论是：

whether the structures are stored in the stack in the place of the argument, or whether they’re stored elsewhere and it’s passed by pointer
whether single-precision floating-point is promoted to double-precision

此外，我故意忽略它，因为我希望人们开始考虑用于x86-64的新ILP32 ABI，由GCC 4.7′s -mx32交换机启用，它遵循与下面描述的相同的ABI(除了指针是32位)。

因此，让我们看一下组装结果。对于参数传递，我们发现

Pointers2 is passed in registers;
Pointers4 is passed in memory;
Integers2 is passed in a single register (two 32-bit values per 64-bit register);
Integers4 is passed in two registers only (two 32-bit values per 64-bit register);
Floats2<float> is passed packed into a single SSE register, no promotion to double
Floats3<float> is passed packed into two SSE registers, no promotion to double;
Floats4<float> is passed packed into two SSE registers, no promotion to double;
Floats2<double> is passed in two SSE registers, one value per register
Floats3<double> and Floats4<double> are passed in memory;
Matrix4x4 and QMatrix4x4 are passed in memory regardless of the underlying type;
QChar is passed in a register;
QLatin1String is passed in registers.
The floating point parameters are passed one per register, without float promotion to double.

对于返回值，结论与上面相同：如果值在寄存器中传递，它也会在寄存器中返回;如果它在内存中传递，则在内存中返回。这导致我们得出以下结论，并仔细阅读ABI文件支持：

Single-precision floating-point types are not promoted to double;
Single-precision floating-point types in a structure are packed into SSE registers if they are still available
Structures bigger than 16 bytes are passed in memory, with an exception for __m256, the type corresponding to one AVX 256-bit register.

IA-64

以下是参数传递的结果：

Both Pointers structures are passed in registers, one pointer per register;
Both Integers structures are passed in registers, packed like x86-64 (two ints per register);
All of the Floats structures are passed in registers, one value per register (unpacked);
QMatrix4x4<float> is passed entirely in registers: half of it (the first 8 floats) are in floating-point registers, one value per register (unpacked); the other half is passed in integer registers out4 to out7 as the memory representations (packed);
QMatrix4x4<double> is passed partly in registers: half of it (the first 8 doubles) are in floating-point registers, one value per register (unpacked); the other half is passed in memory;
QChar and QLatin1String are passed in registers;
Both QMatrix are passed entirely in registers, one value per register (unpacked);
QMatrix4x4 is passed like Matrix4x4, except that the integer is always in memory (the structure is larger than 8*8 bytes);
Individual floating-point parameters are passed one per register; type promotion happens internally in the register.

对于返回值，我们有：

The floating-point structures with up to 8 floating-point members are returned in registers;
The integer structures of up to 32 bytes are returned in registers;
All the rest is returned in memory supplied by the caller.

结论是：

Type promotion happens in hardware, as IA-64 does not have specific registers for single or double precision (is FP registers hold only extended precision data);
Homogeneous structures of floating-point types are passed in registers, up to 8 values; the rest goes to the integer registers if there are some still available or in memory;
All other structures are passed in the integer registers, up to 64 bytes;
Integer registers are allocated for passing any and all types, even if they aren't used (the ABI says they should be used if in the case of C without prototypes).

手臂

我只为 ARMv7 编译了代码，浮点参数在 VFP 寄存器中传递。如果您正在阅读此博客，您可能对性能感兴趣，因此您必须对 ARM 使用"硬浮动"模型。我不会关心较慢的"软浮动"模式。另请注意，这只是 ARMv7：ARMv8 64 位 (AArch64) 规则略有不同，但没有可用的编译器。

以下是参数传递的结果：

Pointers2, Pointers4, Integers2, and Integers4 are passed in registers (note that the Pointers and Integers structures are the same in 32-bit mode);
All of the Float types are passed in registers, one value per register, without promotion of floats to doubles; the values are also stored in memory but I can't tell if this is required or just GCC being dumb;
All types of Matrix4x4, QMatrix and QMatrix4x4 are passed in both memory and registers, which contains the first 16 bytes;
QChar and QLatin1String are passed in registers;
are passed in memory regardless of the underlying type.
The floating point parameters are passed one per register, without float promotion to double.

为了返回这些类型，我们有：

All of the Float types are returned in registers and GCC then stores them all to memory even if they are never used afterwards;
QChar is returned in a register;
Everything else is returned in memory.

请注意，返回类型是 32 位 AAPCS 与 64 位 AAPCS 不同的地方之一：在那里，如果将类型传入寄存器到它是第一个参数的函数，则会在相同的寄存器中返回该类型。32 位 AAPCS 将寄存器返回限制为 4 字节或更小的结构。

我的结论是：

Single-precision floating-point types are not promoted to double;
Homogeneous structures (that is, structures containing one single type) of a floating-point type are passed in floating-point registers if the structure has 4 members or fewer;

MIPS

我尝试了MIPS 32位构建(使用GCC默认的o32 ABI)和MIPS 64位(使用-mabi = o64 -mlong64)。除非另有说明，否则两种体系结构的结果相同。

对于传递参数，它们是：

Both types of Integers and Pointers structures are passed in registers; on 64-bit, two 32-bit integers are packed into a single 64-bit register like x86-64;
Float2<float>, Float3<float>, and Float4<float> are passed in integer registers, not on the floating-point registers; on 64-bit, two floats are packed into a single 64-bit register;
Float2<double> is passed in integer registers; on 32-bit, two 32-bit registers are required to store each double;
On 32-bit, the first two doubles of Float3<double> and Float3<double> are passed in integer registers, the rest are passed in memory;
On 64-bit, Float3<double> and Float3<double> are passed entirely in integer registers;
Matrix4x4, QMatrix, and QMatrix4x4 are passed in integer registers (the portion that fits) and in memory (the rest);
QChar is passed in a register (on MIPS big-endian, it's passed on bits 16-31);
QLatin1String is passed on two registers;
The floating point parameters are passed one per register, without float promotion to double.

对于返回值，MIPS 很简单：所有内容都在内存中返回，甚至是 QChar。

结论就更容易了：

No float is promoted to double;
No structure is ever passed in floating-point registers;
No structure is ever returned in registers.

一般结论

我们能得出的总体结论很少。其中之一是，当存在形式参数时，单精度浮点值不会显式提升为双精度值。自动提升可能只发生在以省略号 (...) 中传递的浮点值，但我们的问题陈述是关于调用知道参数的函数。唯一与规则略有偏差的是 IA-64，但这并不重要，因为硬件(如 x87)仅在一种模式下运行。

对于包含整数参数(包括指针)的结构，没有什么需要进一步优化的：它们被加载到寄存器中，就像它们在内存中出现的那样。这意味着与填充相对应的寄存器部分可能包含未初始化或垃圾数据，或者它可能会使大端模式下的MIPS非常奇怪。这也意味着，在所有体系结构上，小于寄存器的类型不会占用整个寄存器，因此它们可能会与其他成员一起打包。

另一个非常明显：包含浮点数的结构比包含双精度的结构小，因此它们将使用更少的内存或更少的寄存器来传递。

为了继续得出结论，我们需要排除 MIPS，因为它传递整数寄存器中的所有内容并通过内存返回所有内容。如果我们这样做，我们能够看到所有 ABI 都为仅包含一个浮点类型的结构提供了优化。在 ABI 文档中，这些名称略有不同，都意味着同质浮点结构。这些优化意味着结构在特定条件下通过浮点寄存器传递。

第一个分解的实际上是 x86-64：上限为 16 字节，限制为两个 SSE 寄存器。这样做的基本原理似乎是传递一个双精度复数值，这需要 16 个字节。我们能够传递四个单精度值是一个意想不到的好处。

其余架构(ARM 和 IA-64)可以通过寄存器传递更多值，并且始终为每个寄存器传递一个值(无打包)。IA-64 具有更多专用于参数传递的寄存器，因此它可以传递比 ARM 更多的寄存器。代码建议

Structures of up to 16 bytes containing integers and pointers should be passed by value;
Homogeneous structures of up to 16 bytes containing floating-point should be passed by value (2 doubles or 4 floats);
Mixed-type structures should be avoided; if they exist, passing by value is still a good idea;

以上仅适用于平凡可复制和可拼命的结构。所有 C 结构(C++ 中的 POD)都符合这些标准。结语

我应该指出，上面的建议并不总是产生更有效的代码。即使这些值可以在寄存器中传递，我测试的每个编译器(GCC 4.6，Clang 3.0，ICC 12.1)在某些情况下仍然会执行大量内存操作。编译器将结构写入内存，然后将其加载到寄存器中是很常见的。当它这样做时，通过常量引用传递会更有效，因为它会用堆栈指针上的算术替换内存负载。

但是，这些只是编译器团队进一步优化工作的问题。我为 x86-64 测试的三个编译器的优化方式不同，在几乎所有情况下，其中至少有一个设法在没有内存访问的情况下完成。有趣的是，当我们用零替换填充空间时，行为也会发生变化。

好的，复制数据既昂贵又不必要。

但另一方面，引用变量的函数不是线程安全的。除非操作是无组学的，否则有时更习惯复制变量以避免并发线程的任何突变。

相关内容

最新更新

热门标签：