如何在OpenCL中添加工作组中工作项的结果?

我用GlobalWorkSize 64 4 1和WorkGroupSize 1 4 1调用下面的内核，参数output初始化为零。

__kernel void kernelB(__global unsigned int * output) 
{
  uint gid0 = get_global_id(0);
  uint gid1 = get_global_id(1);
  output[gid0] += gid1;
}

我期待6 6 6 6 ...作为gid1的总和(0 + 1 + 2 + 3)。相反，我得到3 3 3 3 ...是否有办法获得这个功能?一般来说，我需要一个工作组中每个工作项的结果的总和。

编辑:似乎必须说，我想解决这个问题没有原子

您需要使用本地内存来存储来自所有工作项的输出。在工作项完成它们的计算之后，您将使用累积步骤对结果进行求和。

__kernel void kernelB(__global unsigned int * output) 
{
  uint item_id = get_local_id(0);
  uint group_id = get_group_id(0);
  //memory size is hard-coded to the expected work group size for this example
  local unsigned int result[4];
  //the computation
  result[item_id] = item_id % 3;
  //wait for all items to write to result
  barrier(CLK_LOCAL_MEM_FENCE);
  //simple O(n) reduction using the first work item in the group
  if(local_id == 0){
    for(int i=1;i<4;i++){
      result[0] += result[i];
    }
    output[group_id] = result[0];
  }
}

多个工作项同时访问global的元素，并且结果是未定义的。您需要使用原子操作或编写每个工作项的唯一位置。

相关内容

最新更新

热门标签：