简单的GLSL卷积着色器速度非常慢

我正在尝试在iOS版OpenGL ES2.0中实现一个2D轮廓着色器。速度慢得离谱。如5fps慢速。我已经追踪到texture2D()调用。但是，如果没有这些，任何卷积着色器都是不可撤消的。我试过用低p而不是中p，但这样一来，所有的东西都是黑色的，尽管它确实能提供另一个5fps的画面，但它仍然无法使用。

这是我的碎片着色器。

varying mediump vec4 colorVarying;
varying mediump vec2 texCoord;
uniform bool enableTexture;
uniform sampler2D texture;
uniform mediump float k;
void main() {
const mediump float step_w = 3.0/128.0;
const mediump float step_h = 3.0/128.0;
const mediump vec4 b = vec4(0.0, 0.0, 0.0, 1.0);
const mediump vec4 one = vec4(1.0, 1.0, 1.0, 1.0);
mediump vec2 offset[9];
mediump float kernel[9];
offset[0] = vec2(-step_w, step_h);
offset[1] = vec2(-step_w, 0.0);
offset[2] = vec2(-step_w, -step_h);
offset[3] = vec2(0.0, step_h);
offset[4] = vec2(0.0, 0.0);
offset[5] = vec2(0.0, -step_h);
offset[6] = vec2(step_w, step_h);
offset[7] = vec2(step_w, 0.0);
offset[8] = vec2(step_w, -step_h);
kernel[0] = kernel[2] = kernel[6] = kernel[8] = 1.0/k;
kernel[1] = kernel[3] = kernel[5] = kernel[7] = 2.0/k;
kernel[4] = -16.0/k;  
if (enableTexture) {
mediump vec4 sum = vec4(0.0);
for (int i=0;i<9;i++) {
mediump vec4 tmp = texture2D(texture, texCoord + offset[i]);
sum += tmp * kernel[i];
}
gl_FragColor = (sum * b) + ((one-sum) * texture2D(texture, texCoord));
} else {
gl_FragColor = colorVarying;
}
}

这是未优化的，也没有最终确定，但在继续之前我需要提高性能。我已经尝试用一个实心vec4替换循环中的texture2D()调用，尽管其他一切都在进行，但它运行没有问题。

如何对此进行优化？我知道这是可能的，因为我在3D运行中看到了更多的效果，没有问题。我不明白为什么这会引起任何麻烦。

我自己也做过这件事，我看到了一些可以在这里优化的东西。

首先，我会删除enableTexture条件，而是将着色器拆分为两个程序，一个用于this的true状态，另一个用于false状态。条件在iOS片段着色器中非常昂贵，尤其是那些具有纹理读取功能的着色器。

其次，这里有九个依赖的纹理读取。这些是在片段着色器中计算纹理坐标的纹理读取。在iOS设备中的PowerVR GPU上，相关纹理读取非常昂贵，因为它们阻止硬件使用缓存等优化纹理读取。因为您是从8个周围像素和一个中心像素的固定偏移进行采样，所以这些计算应该向上移动到顶点着色器中。这也意味着这些计算将不必对每个像素执行，只需对每个顶点执行一次，然后硬件插值将处理其余部分

第三，到目前为止，iOS着色器编译器还没有很好地处理for()循环，所以我倾向于尽可能避免这些循环。

正如我提到的，我已经在我的开源iOS GPUImage框架中完成了这样的卷积着色器。对于通用卷积过滤器，我使用以下顶点着色器：

attribute vec4 position;
attribute vec4 inputTextureCoordinate;
uniform highp float texelWidth; 
uniform highp float texelHeight; 
varying vec2 textureCoordinate;
varying vec2 leftTextureCoordinate;
varying vec2 rightTextureCoordinate;
varying vec2 topTextureCoordinate;
varying vec2 topLeftTextureCoordinate;
varying vec2 topRightTextureCoordinate;
varying vec2 bottomTextureCoordinate;
varying vec2 bottomLeftTextureCoordinate;
varying vec2 bottomRightTextureCoordinate;
void main()
{
gl_Position = position;
vec2 widthStep = vec2(texelWidth, 0.0);
vec2 heightStep = vec2(0.0, texelHeight);
vec2 widthHeightStep = vec2(texelWidth, texelHeight);
vec2 widthNegativeHeightStep = vec2(texelWidth, -texelHeight);
textureCoordinate = inputTextureCoordinate.xy;
leftTextureCoordinate = inputTextureCoordinate.xy - widthStep;
rightTextureCoordinate = inputTextureCoordinate.xy + widthStep;
topTextureCoordinate = inputTextureCoordinate.xy - heightStep;
topLeftTextureCoordinate = inputTextureCoordinate.xy - widthHeightStep;
topRightTextureCoordinate = inputTextureCoordinate.xy + widthNegativeHeightStep;
bottomTextureCoordinate = inputTextureCoordinate.xy + heightStep;
bottomLeftTextureCoordinate = inputTextureCoordinate.xy - widthNegativeHeightStep;
bottomRightTextureCoordinate = inputTextureCoordinate.xy + widthHeightStep;
}

和以下片段着色器：

precision highp float;
uniform sampler2D inputImageTexture;
uniform mediump mat3 convolutionMatrix;
varying vec2 textureCoordinate;
varying vec2 leftTextureCoordinate;
varying vec2 rightTextureCoordinate;
varying vec2 topTextureCoordinate;
varying vec2 topLeftTextureCoordinate;
varying vec2 topRightTextureCoordinate;
varying vec2 bottomTextureCoordinate;
varying vec2 bottomLeftTextureCoordinate;
varying vec2 bottomRightTextureCoordinate;
void main()
{
mediump vec4 bottomColor = texture2D(inputImageTexture, bottomTextureCoordinate);
mediump vec4 bottomLeftColor = texture2D(inputImageTexture, bottomLeftTextureCoordinate);
mediump vec4 bottomRightColor = texture2D(inputImageTexture, bottomRightTextureCoordinate);
mediump vec4 centerColor = texture2D(inputImageTexture, textureCoordinate);
mediump vec4 leftColor = texture2D(inputImageTexture, leftTextureCoordinate);
mediump vec4 rightColor = texture2D(inputImageTexture, rightTextureCoordinate);
mediump vec4 topColor = texture2D(inputImageTexture, topTextureCoordinate);
mediump vec4 topRightColor = texture2D(inputImageTexture, topRightTextureCoordinate);
mediump vec4 topLeftColor = texture2D(inputImageTexture, topLeftTextureCoordinate);
mediump vec4 resultColor = topLeftColor * convolutionMatrix[0][0] + topColor * convolutionMatrix[0][1] + topRightColor * convolutionMatrix[0][2];
resultColor += leftColor * convolutionMatrix[1][0] + centerColor * convolutionMatrix[1][1] + rightColor * convolutionMatrix[1][2];
resultColor += bottomLeftColor * convolutionMatrix[2][0] + bottomColor * convolutionMatrix[2][1] + bottomRightColor * convolutionMatrix[2][2];
gl_FragColor = resultColor;
}

texelWidth和texelHeight一致性是输入图像的宽度和高度的倒数，convolutionMatrix一致性指定卷积中各种样本的权重。

在iPhone 4上，640x480帧的相机视频需要4-8毫秒，这对于该图像大小的60 FPS渲染来说已经足够了。如果你只需要做一些像边缘检测之类的事情，你可以简化上面的内容，在预处理中将图像转换为亮度，然后只从一个颜色通道中采样。这甚至更快，在同一台设备上每帧大约2毫秒。

我所知道的减少此着色器所用时间的唯一方法是减少纹理获取次数。由于着色器从中心像素周围等距点采样纹理，并对其进行线性组合，因此可以通过使用可用于纹理采样的GL_LINEAR模式来减少获取次数。

基本上，不是在每个纹素处采样，而是在一对纹素之间采样，以直接获得线性加权和。

让我们将偏移量(-stepw，-steph)和(-stepw，0)处的采样分别称为x0和x1。那么你的金额是

sum = x0*k0 + x1*k1

现在，如果您在这两个纹素之间采样，距离为来自x0的k0/(k0+k1)，因此来自x1的k1/(k0+k1)，则GPU将在提取过程中执行线性加权，并给出

y = x1*k1/(k0+k1) + x0*k0/(k1+k0)

因此，总和可以计算为

sum = y*(k0 + k1)只需一次获取！

如果对其他相邻像素重复此操作，则最终将为每个相邻偏移执行4次纹理提取，并为中心像素执行一次额外的纹理提取。

链接解释了这个更好的

相关内容

最新更新

热门标签：