Metal内核在MTLBuffer中写入随机数，如果它是用可变数据危险进行索引的

我正试图在GPU上实现一个高动态范围算法，为此我需要计算直方图。到目前为止，金属代码是这样的：

kernel void
hist(texture2d_array<half, access::read> inArray [[texture(0)]],
        device float *t [[buffer(0)]], // ignore this
        volatile device uint *histogram [[buffer(1)]],
        uint2 gid [[thread_position_in_grid]]){
int4 Y_ldr;
uint redChannel;
for(uint i = 0; i < inArray.get_array_size(); i++){
    Y_ldr = int4(inArray.read(gid, i, 0) * 255);
    redChannel = Y_ldr.r;
    histogram[redChannel]++;
}

}

内核用巨大的数字填充直方图的一半（256个条目），另一半为空（初始值）。相反，当我写时

histogram[0] = 1; // just a number
histogram[0] = redChannel; // OR this

在这两种情况下，我都得到了位置0的正确数字。使用atomic_uint没有帮助，线程组屏障也没有帮助。尝试

  histogram[0]++;

揭示了金属不会自动处理数据危害，但数字很小，比如12000。那么，是什么引起了的麻烦

我得到了不合理的数字
正好有一半的数组被遗漏了

如果您需要了解管道状态是如何设置的，请参阅此处：

var threadGroupCount = MTLSizeMake(8, 8, 1)
var threadgroups = MTLSizeMake(pictures!.width/threadGroupCount.width, pictures!.height/threadGroupCount.height, 1)
computeCommandEncoder.setComputePipelineState(hist!)
computeCommandEncoder.setTexture(pictures, atIndex: 0)
computeCommandEncoder.setBuffer(exposure_times, offset: 0, atIndex: 0)
computeCommandEncoder.setBuffer(histogram, offset: 0, atIndex: 1) // <-- this is important!!!CommandEncoder.dispatchThreadgroups(threadgroups, threadsPerThreadgroup: threadGroupCount)

耶稣iPad CPU将UInt解释为64位数字（8字节）。对于GPU，Int包含32位（4字节）。当指针设置在位置[1]时，CPU将该位置解释为数组条目[0]的上部4字节。这导致了惊人的巨大数字。为了解决这个问题，我必须将直方图设置为CPU侧的[UInt32]。

相关内容

最新更新

热门标签：