当我在Alderlake GT1集成GPU上编译和链接我的GLSL着色器时,我得到警告:
SIMD32 shader efficient
此警告通过glDebugMessageCallbackARB机制报告。
我想调查一下我是否可以避免这种低效率,但我不确定如何获得有关此警告的更多信息。
驱动程序的完整输出,对于这个着色器:
WRN [Shader Compiler][Other]{Notification}: VS SIMD8 shader: 11 inst, 0 loops, 40 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 176 to 112 bytes.
WRN [API][Performance]{Notification}: SIMD32 shader inefficient
WRN [Shader Compiler][Other]{Notification}: FS SIMD8 shader: 5 inst, 0 loops, 20 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 80 to 48 bytes.
WRN [Shader Compiler][Other]{Notification}: FS SIMD16 shader: 5 inst, 0 loops, 28 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 80 to 48 bytes.
消息是在片段着色器编译期间创建的,顺便说一下。
顶点着色器:
#version 150
in mediump vec2 position;
out lowp vec4 clr;
uniform mediump vec2 rotx;
uniform mediump vec2 roty;
uniform mediump vec2 translation;
uniform lowp vec4 colour;
void main()
{
gl_Position.x = dot( position, rotx ) + translation.x;
gl_Position.y = dot( position, roty ) + translation.y;
gl_Position.z = 1.0;
gl_Position.w = 1.0;
clr = colour;
}
我的片段着色器:
#version 150
in lowp vec4 clr;
out lowp vec4 fragColor;
void main()
{
fragColor = clr;
}
也就是说,我怀疑它是特定于着色器的,因为它似乎报告了我在这个平台上使用的每个着色器?
GL渲染器:Mesa Intel(R) Graphics (ADL-S GT1)
OS:Ubuntu 22.04
GPU:AlderLake-S GT1
API:OpenGL 3.2 Core Profile
GLSL版本:150
这似乎来自英特尔片段着色器编译器,这是Mesa的一部分。
brw_fs.cpp
查看这段代码,编译器似乎有三个选项:使用SIMD8,SIMD16或SIMD32。这里指的是宽度,而不是位。所以SIMD32是32-wide SIMD。编译器使用启发式方法来查看SIMD32版本是否有效,如果不是,则跳过该选项。
当然,这种启发式方法可能会出错,所以有一个选项可以强制BRW编译器不顾一切地尝试SIMD32。
环境变量设置INTEL_DEBUG=do32
将告诉编译器也尝试SIMD32。
当我在我的系统上测试它时,我确实观察到驱动程序现在报告三个不同的结果:
WRN [Shader Compiler][Other]{Notification}: FS SIMD8 shader: 5 inst, 0 loops, 20 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 80 to 48 bytes.
WRN [Shader Compiler][Other]{Notification}: FS SIMD16 shader: 5 inst, 0 loops, 28 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 80 to 48 bytes.
WRN [Shader Compiler][Other]{Notification}: FS SIMD32 shader: 10 inst, 0 loops, 928 cycles, 0:0 spills:fills, 2 sends, scheduled with mode top-down, Promoted 0 constants, compacted 160 to 96 bytes.
观察,在这种情况下,启发式绝对是正确的:几乎比SIMD8多50倍的周期?
有趣的事实:BRW代表Broadwater, gen4图形。但是第12代Intel gpu仍然使用这个编译器。