使用全局变量显示C++在使用 pthread 时比指针慢 100%?

我有一个相当的程序来显示 2 个类似程序的性能，两者都使用 2 个线程进行计算。核心区别在于一个使用全局变量，另一个使用"新"对象，如下所示：

#include<pthread.h>
#include<stdlib.h>
struct M{
long a;
long b;
}obj;
size_t count=2000000000;
void* addx(void*args){
long*pl=(long*)args;
for(size_t i=0;i<count;++i)
(*pl)*=i;
return NULL;
}
int main(int argc,char*argv[]){
pthread_t tid[2];
pthread_create(&tid[0],NULL,addx,&obj.a);
pthread_create(&tid[1],NULL,addx,&obj.b);
pthread_join(tid[0],NULL);
pthread_join(tid[1],NULL);
return 0;
}
clang++ test03_threads.cpp -o test03_threads -lpthread -O2 && time ./test03_threads
real    0m3.626s
user    0m6.595s
sys 0m0.009s

它很慢，然后我修改了 obj 以动态创建(我希望它会更慢)：

#include<pthread.h>
#include<stdlib.h>
struct M{
long a;
long b;
}*obj;//difference 1
size_t count=2000000000;
void* addx(void*args){
long*pl=(long*)args;
for(size_t i=0;i<count;++i)
(*pl)*=i;
return NULL;
}
int main(int argc,char*argv[]){
obj=new M;//difference 2
pthread_t tid[2];
pthread_create(&tid[0],NULL,addx,&obj->a);//difference 3
pthread_create(&tid[1],NULL,addx,&obj->b);//difference 4
pthread_join(tid[0],NULL);
pthread_join(tid[1],NULL);
delete obj;//difference 5
return 0;
}
clang++ test03_threads_new.cpp -o test03_threads_new -lpthread -O2 && time ./test03_threads_new
real    0m1.880s
user    0m3.745s
sys 0m0.007s

它比前一个快了惊人的 100%。我也在 Linux 上尝试了 g++，结果相同。但是如何解释呢？我知道 obj 是全局变量，但 *obj 仍然是全局变量，只是动态创建。核心区别是什么？

我认为这确实是因为虚假共享，正如 Unimportant 所建议的那样。

你可能会问，为什么会有区别呢？

因为count变量！由于这是一个变量，并且size_t的基础类型恰好对您来说是long，因此编译器无法对其进行优化(因为pl可能指向count)。如果count是一个int，由于严格的混叠规则，编译器可以优化它(或者干脆它可以const size_t)。

因此，生成的代码每次都必须在循环中读取count。

在第一个示例中，count和obj两个全局变量，它们彼此靠近放置。因此，链接器很有可能将这些变量放入同一缓存行中。因此，写入obj.a或obj.b将使count的缓存行无效。所以CPU必须同步count的读取。

在第二个示例中，obj在堆上分配，它的地址将与count足够远，因此它们不会占用相同的缓存行。count无需同步。

相关内容

最新更新

热门标签：