如何使用使用金属的自定义计算着色器并获得非常流畅的性能?

我试图使用苹果和自定义compute Shaders给出的默认MPSKernal滤镜通过金属应用实时摄像机滤镜。

在计算着色器中，我使用 MPSImageGaussianBlur 进行了就地编码代码在这里

func encode(to commandBuffer: MTLCommandBuffer, sourceTexture: MTLTexture, destinationTexture: MTLTexture, cropRect: MTLRegion = MTLRegion.init(), offset : CGPoint) {
let blur = MPSImageGaussianBlur(device: device, sigma: 0)
blur.clipRect = cropRect
blur.offset = MPSOffset(x: Int(offset.x), y: Int(offset.y), z: 0)
let threadsPerThreadgroup = MTLSizeMake(4, 4, 1)
let threadgroupsPerGrid = MTLSizeMake(sourceTexture.width / threadsPerThreadgroup.width, sourceTexture.height / threadsPerThreadgroup.height, 1)
let commandEncoder = commandBuffer.makeComputeCommandEncoder()
commandEncoder.setComputePipelineState(pipelineState!)
commandEncoder.setTexture(sourceTexture, at: 0)
commandEncoder.setTexture(destinationTexture, at: 1)
commandEncoder.dispatchThreadgroups(threadgroupsPerGrid, threadsPerThreadgroup: threadsPerThreadgroup)
commandEncoder.endEncoding()
autoreleasepool {
var inPlaceTexture = destinationTexture
blur.encode(commandBuffer: commandBuffer, inPlaceTexture: &inPlaceTexture, fallbackCopyAllocator: nil)
}
}

但有时就地纹理往往会失败，最终会在屏幕上产生混蛋效果。

因此，如果有人可以在不使用就地纹理或如何使用fallbackCopyAllocator或以不同的方式使用compute shaders的情况下向我建议解决方案，那将非常有帮助。

我在这方面做了足够的编码(将计算着色器应用于来自相机的视频流(，您遇到的最常见的问题是"像素缓冲区重用"问题。

从示例缓冲区创建的金属纹理将备份像素缓冲区，该缓冲区由视频会话管理，并且可以重新用于后续视频帧，除非您保留对示例缓冲区的引用(保留对金属纹理的引用是不够的(。

请随时查看我在 https://github.com/snakajima/vs-metal 上的代码，它将各种计算着色器应用于实时视频流。

VSContext：set(( 方法除了纹理参数外，还采用可选的 sampleBuffer 参数，并保留对 sampleBuffer 的引用，直到计算着色器的计算完成(在 VSRuntime：encode(( 方法中(。

就地操作方法可能会命中或未命中，具体取决于基础筛选器正在执行的操作。如果它是某些参数的单次传递过滤器，那么您最终将不适合这些情况。

自从添加该方法以来，MPS 添加了底层 MTLHeap，以便为您更透明地管理内存。如果您的 MPSImage 不需要由 CPU 查看，并且只在 GPU 上存在很短的时间，建议您改用 MPSTemporaryImage。当 readCount 达到 0 时，后备存储将通过 MPS 堆回收，并可用于下游使用的其他 MPSTemporaryImages 和其他临时资源。同样，在绝对必要之前，它的支持存储实际上不会从堆中分配(例如，写入纹理，或调用 .texture(为每个命令缓冲区分配一个单独的堆。

使用临时映像应该有助于减少内存使用量。例如，在具有一百多次传递的 Inception v3 神经网络图中，堆能够自动将图减少到仅四个分配。

相关内容

最新更新

热门标签：