连续内存块中的C++原子操作

在连续内存块中赋值时，是否可以使用原子操作，可能使用std::atomic库。

如果我有这个代码：

uint16_t* data = (uint16_t*) calloc(num_values, size);

我能做些什么来进行这样的原子操作：

data[i] = 5;

我将有多个线程同时分配给data，可能在同一索引处。这些线程在特定索引处修改值的顺序对我来说并不重要，只要修改是原子性的，就可以避免任何可能的数据损坏。

编辑：所以，根据@user4581301，我在这里为我的问题提供一些上下文。我正在编写一个程序来将深度视频数据帧与彩色视频数据帧对齐。用于深度和颜色的相机传感器具有不同的焦点特性，因此它们不会完全对齐。一般算法包括将深度空间中的像素投影到颜色空间中的区域，然后用单个像素覆盖深度帧中的所有值，覆盖该区域。我正在并行化这个算法。这些投影区域可能重叠，因此当并行化时，对索引的写入可能同时发生。

伪代码如下：

for x in depth_video_width:
for y in depth_video_height:
pixel = get_pixel(x, y)
x_min, x_max, y_min, y_max = project_depth_pixel(x, y)
// iterate over projected region
for x` in [x_min, x_max]:
for y` in [y_min, y_max]:
// possible concurrent modification here
data[x`, y`] = pixel

外循环或最外层的两个循环是并行的。

你将无法像这样做你想做的事情。

原子数组没有多大意义，也不是您想要的(您希望单个写入是原子的(。

你可以有一个原子数组：

#include <atomic>
#include <array>
int main()
{
std::array<std::atomic<uint16_t>, 5> data{};
data[1] = 5;
}

…但现在你不能只访问一个连续的uint16_t块，这意味着你想这样做。

如果您不介意特定于平台的操作，那么可以保留uint16_t的数组，并确保只对每一个使用原子操作(例如GCC的__atomic内部函数(。

但是，一般来说，我认为您会希望保持简单，只需在访问普通数组时锁定互斥对象。测量是肯定的，但你可能会惊讶于你没有得到多少性能损失。

如果你迫切需要原子，迫切需要uint16_t的底层数组，迫切需要标准解决方案，你可以等待C++20，并为每个元素保留一个std::atomic_ref(这就像一个非拥有的std::atomic(，然后通过它们访问元素。但是，对于任何直接访问元素的操作，您仍然必须谨慎，可能是通过使用锁，或者至少要非常小心什么时候做什么。在这一点上，您的代码要复杂得多：确保它是值得的。

在最后一个答案的基础上，我强烈反对使用原子数组，因为对原子的任何读取或写入都会锁定整个缓存行(至少在x86上(。在实践中，这意味着当访问数组中的元素i(读取或写入它(时，您将锁定该元素周围的缓存线(因此其他线程无法访问该特定的缓存线(。

问题的解决方案是另一个答案中提到的互斥。

对于支持的最大原子操作，目前似乎是64位(请参阅https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3a-part-1-manual.html)

The Intel-64 memory ordering model guarantees that, for each of the following 
memory-access instructions, the constituent memory operation appears to execute 
as a single memory access:
• Instructions that read or write a single byte.
• Instructions that read or write a word (2 bytes) whose address is aligned on a 2
byte boundary.
• Instructions that read or write a doubleword (4 bytes) whose address is aligned
on a 4 byte boundary.
• Instructions that read or write a quadword (8 bytes) whose address is aligned on
an 8 byte boundary.
Any locked instruction (either the XCHG instruction or another read-modify-write
instruction with a LOCK prefix) appears to execute as an indivisible and 
uninterruptible sequence of load(s) followed by store(s) regardless of alignment.

换句话说，您的处理器不知道如何执行超过64位的原子操作。这里我甚至没有提到atomic的STL实现，它可以使用锁(请参阅https://en.cppreference.com/w/cpp/atomic/atomic/is_lock_free)。

相关内容

最新更新

热门标签：