CUDA设备上类型的大小理论上可能与主机平台上的大小不同。那么,在代码中表达"在我的CUDA设备上的sizeof(T)"的惯用方法是什么?
在任何当前支持的CUDA平台上都不需要您询问的任何内容。CUDA工具链与主机编译器和主机C++运行库高度集成的原因之一是,主机和设备上的基本类型的大小始终匹配是有保证的。不需要对尺寸进行惯用翻译。对于主机和设备,sizeof
的结果总是相同的。请注意,基本类型的大小可能因平台而异(Windows是LLP64/IL32P64平台,linux和OS X是LP64/I32LP64平台),但这对GPU没有影响。
还要注意的是,GPU可能会对复合类型施加对齐要求,这可能意味着编译的大小与您预期的不同。文件中详细讨论了适用的条件。
例如,考虑以下琐碎的示例代码:
#include <cstdio>
__device__ __host__ __noinline__ void printsizes(const char* title)
{
printf("%sn", title);
printf("sizeof(void*) = %ldn", (unsigned long)sizeof(void*));
printf("sizeof(char) = %ldn", (unsigned long)sizeof(char));
printf("sizeof(bool) = %ldn", (unsigned long)sizeof(bool));
printf("sizeof(short) = %ldn", (unsigned long)sizeof(short));
printf("sizeof(int) = %ldn", (unsigned long)sizeof(int));
printf("sizeof(long) = %ldn", (unsigned long)sizeof(long));
printf("sizeof(long long) = %ldn", (unsigned long)sizeof(long long));
}
__global__ void printkernel()
{
printsizes("On the device:");
}
int main()
{
printsizes("On the host:");
printkernel<<<1,1>>>();
cudaDeviceSynchronize();
cudaDeviceReset();
return 0;
}
在Linux 64平台上编译和运行会产生以下结果:
$ nvcc -arch=sm_52 -m64 -o sizeof64 sizeof.cu
$ ./sizeof64
On the host:
sizeof(void*) = 8
sizeof(char) = 1
sizeof(bool) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 8
sizeof(long long) = 8
On the device:
sizeof(void*) = 8
sizeof(char) = 1
sizeof(bool) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 8
sizeof(long long) = 8
构建在64位Windows平台上,它产生了以下结果:
>nvcc -arch=sm_21 -m64 sizes.cu
sizes.cu
Creating library a.lib and object a.exp
>a.exe
On the host:
sizeof(void*) = 8
sizeof(char) = 1
sizeof(bool) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 4
sizeof(long long) = 8
On the device:
sizeof(void*) = 8
sizeof(char) = 1
sizeof(bool) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 4
sizeof(long long) = 8
构建在32位Windows平台上,它产生了以下结果:
>nvcc -arch=sm_21 -m32 sizes.cu
sizes.cu
Creating library a.lib and object a.exp
C:UsersdavidDocuments>a.exe
On the host:
sizeof(void*) = 4
sizeof(char) = 1
sizeof(bool) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 4
sizeof(long long) = 8
On the device:
sizeof(void*) = 4
sizeof(char) = 1
sizeof(bool) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 4
sizeof(long long) = 8
注意,void *
和long
的大小可以在不同平台之间变化。但在任何情况下,GPU的大小都与主机的大小相匹配。这是CUDA驱动程序和GPU运行时的基本设计原则。