叮叮当的__restrict不一致?



;可向量化的";代码,并注意到关于C++__restrict关键字/扩展~,即使在简单的情况下,Clang的行为与GCC相比也是不同和不切实际的。

对于编译器生成的代码,速度减慢大约是15倍(在我的特定情况下,不是下面的示例(。

这是代码(也可在https://godbolt.org/z/sdGd43x75):

struct Param {
int *x;
};
int foo(int *a, int *b) {
*a = 5;
*b = 6;
// No significant optimization here, as expected (for clang/gcc)
return *a + *b;
}
int foo(Param a, Param b) {
*a.x = 5;
*b.x = 6;
// No significant optimization here, as expected (for clang/gcc)
return *a.x + *b.x;
}
/////////////////////
struct ParamR {
// "Restricted pointers assert that members point to disjoint storage"
// https://en.cppreference.com/w/c/language/restrict, is restrict's 
// interpretation for C can be used in C++ (for __restrict too ?) ?
int *__restrict x;
};
int rfoo(int *__restrict a, int *__restrict b) {
*a = 5;
*b = 6;
// Significant optimization here, as expected (for clang/gcc)
return *a + *b;
}
int rfoo(ParamR a, ParamR b) {
*a.x = 5;
*b.x = 6;
// No significant optimization here, NOT expected (clang fails?, gcc optimizes)
return *a.x + *b.x;
}
int rfoo(ParamR *__restrict a, ParamR *__restrict b) {
*a->x = 5;
*b->x = 6;
// No significant optimization here, NOT expected (clang fails?, gcc optimizes)
return *a->x + *b->x;
}

对于C++(__restrict(和C代码(使用std限制(都会发生这种情况。

如何让Clang理解指针总是指向不相交的存储?

这似乎是一个错误。我不知道我是否应该称之为bug,因为它确实为程序创建了正确的行为,比方说这是优化器中错过的机会。

我尝试了一些变通方法,唯一有效的方法是始终将指针作为限制参数传递。像这样:

int rfoo(int *__restrict a, int *__restrict b) {
*a = 5;
*b = 6;
// Significant optimization here, as expected (for clang/gcc)
return *a + *b;
}
// change this:
int rfoo(ParamR a, ParamR b) {
*a.x = 5;
*b.x = 6;
// No significant optimization here, NOT expected (clang fails?, gcc optimizes)
return *a.x + *b.x;
}
// to this:
int rfoo2(ParamR a, ParamR b) {
return rfoo(a.x, b.x);
}

clang 12.0.0:的输出

rfoo(ParamR, ParamR):                       # @rfoo(ParamR, ParamR)
mov     dword ptr [rdi], 5
mov     dword ptr [rsi], 6
mov     eax, dword ptr [rdi]
add     eax, 6
ret
rfoo2(ParamR, ParamR):                      # @rfoo2(ParamR, ParamR)
mov     dword ptr [rdi], 5
mov     dword ptr [rsi], 6
mov     eax, 11
ret

现在,这非常不方便,尤其是对于更复杂的代码,但如果性能差异如此之大且重要,并且您无法更改为gcc,则可能需要考虑这样做。

相关内容

  • 没有找到相关文章

最新更新