Shader编译器在Alderlake GT1: SIMD32 Shader效率低下



当我在Alderlake GT1集成GPU上编译和链接我的GLSL着色器时,我得到警告:

SIMD32 shader efficient

此警告通过glDebugMessageCallbackARB机制报告。

我想调查一下我是否可以避免这种低效率,但我不确定如何获得有关此警告的更多信息。

驱动程序的完整输出,对于这个着色器:

WRN [Shader Compiler][Other]{Notification}: VS SIMD8 shader: 11 inst, 0 loops, 40 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 176 to 112 bytes.
WRN [API][Performance]{Notification}: SIMD32 shader inefficient
WRN [Shader Compiler][Other]{Notification}: FS SIMD8 shader: 5 inst, 0 loops, 20 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 80 to 48 bytes.
WRN [Shader Compiler][Other]{Notification}: FS SIMD16 shader: 5 inst, 0 loops, 28 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 80 to 48 bytes.

消息是在片段着色器编译期间创建的,顺便说一下。

顶点着色器:

#version 150
in mediump vec2 position;
out lowp vec4 clr;
uniform mediump vec2 rotx;
uniform mediump vec2 roty;
uniform mediump vec2 translation;
uniform lowp vec4 colour;
void main()
{
gl_Position.x = dot( position, rotx ) + translation.x;
gl_Position.y = dot( position, roty ) + translation.y;
gl_Position.z = 1.0;
gl_Position.w = 1.0;
clr = colour;
}

我的片段着色器:

#version 150
in  lowp vec4 clr;
out lowp vec4 fragColor;
void main()
{
fragColor = clr;
}

也就是说,我怀疑它是特定于着色器的,因为它似乎报告了我在这个平台上使用的每个着色器?

GL渲染器:Mesa Intel(R) Graphics (ADL-S GT1)

OS:Ubuntu 22.04

GPU:AlderLake-S GT1

API:OpenGL 3.2 Core Profile

GLSL版本:150

这似乎来自英特尔片段着色器编译器,这是Mesa的一部分。

brw_fs.cpp

查看这段代码,编译器似乎有三个选项:使用SIMD8,SIMD16SIMD32。这里指的是宽度,而不是位。所以SIMD32是32-wide SIMD。

编译器使用启发式方法来查看SIMD32版本是否有效,如果不是,则跳过该选项。

当然,这种启发式方法可能会出错,所以有一个选项可以强制BRW编译器不顾一切地尝试SIMD32。

环境变量设置INTEL_DEBUG=do32将告诉编译器也尝试SIMD32。

当我在我的系统上测试它时,我确实观察到驱动程序现在报告三个不同的结果:

WRN [Shader Compiler][Other]{Notification}: FS SIMD8 shader: 5 inst, 0 loops, 20 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 80 to 48 bytes.
WRN [Shader Compiler][Other]{Notification}: FS SIMD16 shader: 5 inst, 0 loops, 28 cycles, 0:0 spills:fills, 1 sends, scheduled with mode top-down, Promoted 0 constants, compacted 80 to 48 bytes.
WRN [Shader Compiler][Other]{Notification}: FS SIMD32 shader: 10 inst, 0 loops, 928 cycles, 0:0 spills:fills, 2 sends, scheduled with mode top-down, Promoted 0 constants, compacted 160 to 96 bytes.

观察,在这种情况下,启发式绝对是正确的:几乎比SIMD8多50倍的周期?

有趣的事实:BRW代表Broadwater, gen4图形。但是第12代Intel gpu仍然使用这个编译器。

最新更新