与库达合作团体陷入僵局

在 CUDA 编程指南中关于协作组的部分，有一个网格本地同步的示例：

grid_group grid = this_grid();
grid.sync();

不幸的是，我没有找到grid.sync()行为的准确定义。采用以下__syncthreads定义并将其扩展到网格级别是否正确？

void __syncthreads();
等到线程块中的所有线程都有达到这一点，并且所有全局和共享内存访问由 __syncthreads(( 之前的这些线程对中的所有线程都可见块。

所以，我的问题是正确的：

this_grid().sync();
等到网格中的所有线程都有达到这一点，并且所有全局和共享内存访问由this_grid((.sync((之前的这些线程对网格。

我怀疑这样做的正确性，因为在 CUDA 编程指南中，下面的几行grid.sync();有以下陈述：

为了保证线程块在 GPU 上的共存，需要仔细考虑启动的块数。

这是否意味着如果我使用如此多的线程，以便线程块没有共同驻留，我最终可能会陷入线程可能死锁的情况？

当我尝试使用coalesced_threads().sync()时会出现同样的问题。以下正确吗？

coalesced_threads().sync();
等到翘曲中的所有活动线程都有达到这一点，并且所有全局和共享内存访问由coalesced_threads((.sync((之前的这些线程对中的所有线程都可见翘曲的活动线程列表。

以下示例是否从 while 循环退出？

auto ct = coalesced_threads();
assert(ct.size() == 2);
b = 0; // shared between all threads
if (ct.thread_rank() == 0)
while (b == 0) {
// what if only rank 0 thread is always taken due to thread divergence?
ct.sync(); // does it guarantee that rank 0 will wait for rank 1?
}
if (ct.thread_rank() == 1)
while (b == 0) {
// what if a thread with rank 1 never executed?
b = 1; 
ct.sync(); // does it guarantee that rank 0 will wait for rank 1?
}

为了使上面的例子清楚，如果没有ct.sync()它是不安全的，并且可能会死锁(无限循环(：

auto ct = coalesced_threads();
assert(ct.size() == 2);
b = 0; // shared between all threads
if (ct.thread_rank() == 0)
while (b == 0) {
// what if only rank 0 thread is always taken due to thread divergence?
}
if (ct.thread_rank() == 1)
while (b == 0) {
// what if a thread with rank 1 never executed?
b = 1; 
}

所以，我的问题是正确的：

this_grid((.sync((;

等待网格中的所有线程都达到这一点，并且这些线程在 this_grid((.sync(( 之前进行的所有全局和共享内存访问对网格中的所有线程都可见。

是的，这是正确的，假设您有适当的合作启动。适当的合作发射意味着许多事情：

协作启动属性在您正在运行的 GPU 上为 true
您已使用正确格式的合作启动启动
您已满足合作启动的网格大小调整要求
合作社启动后，cudaGetLastError()返回cudaSuccess

这是否意味着如果我使用这么多线程，以便线程块没有共同驻留

如果您违反了合作启动的要求，则正在探索未定义的行为。试图明确回答这些问题是没有意义的，除了说行为是未定义的。

关于您关于合并线程的陈述，它们是正确的，尽管必须仔细理解措辞。特定指令的活动线程与合并线程相同。

在您的示例中，您正在创建一个非法案例：

auto ct = coalesced_threads();
assert(ct.size() == 2); //there are exactly 2 threads in group ct
b = 0; // shared between all threads
if (ct.thread_rank() == 0) // this means that only thread whose rank is zero can participate in the next instruction - by definition you have excluded 1 thread
while (b == 0) {  
// what if only rank 0 thread is always taken due to thread divergence?
// it is illegal to request a synchronization of a group of threads when your conditional code prevents one or more threads in the group from participating
ct.sync(); // does it guarantee that rank 0 will wait for rank 1?
}

代码中不同位置的两个不同的.sync()语句无法满足单个同步屏障的要求。它们各自代表一个单独的障碍，必须适当满足其要求。

由于非法编码，此示例也具有未定义的行为;相同的注释适用。

这

是否意味着如果我使用这么多线程，以便线程块没有共同驻留

在这种情况下，执行将出错并显示以下消息。

too many blocks in cooperative launch

相关内容

最新更新

热门标签：