torch 1.6.0或更高版本的Pytorch自定义CUDA扩展构建失败



我有一个pytorch的自定义CUDA扩展(https://pytorch.org/tutorials/advanced/cpp_extension.html),过去可以与pytorch1.4、CUDA10.1和Titan Xp GPU配合使用。然而,最近我们将系统更改为新的A40 GPU和CUDA11.1。当我尝试使用CUDA11.1、pytorch 1.8.1、gcc 9.3.0和Ubuntu 20.04构建我的自定义pytorch扩展时,我得到了以下错误:

$ python3 setup.py install
running install
running bdist_egg
running egg_info
creating cuda_test.egg-info
writing cuda_test.egg-info/PKG-INFO
writing dependency_links to cuda_test.egg-info/dependency_links.txt
writing top-level names to cuda_test.egg-info/top_level.txt
writing manifest file 'cuda_test.egg-info/SOURCES.txt'
reading manifest file 'cuda_test.egg-info/SOURCES.txt'
writing manifest file 'cuda_test.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'cuda_test' extension
creating /path/to/code/cuda/test/build
creating /path/to/code/cuda/test/build/temp.linux-x86_64-3.7
Emitting ninja build file /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] /cm/shared/apps/cuda11.1/toolkit/11.1.1/bin/nvcc --generate-dependencies-with-compile --dependency-output /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o.d -I/path/to/code/venv/lib/python3.7/site-packages/torch/include -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/TH -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/THC -I/cm/shared/apps/cuda11.1/toolkit/11.1.1/include -I/path/to/code/venv/include/python3.7m -c -c /path/to/code/cuda/test/test_cuda.cu -o /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=cuda_test -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
FAILED: /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o
/cm/shared/apps/cuda11.1/toolkit/11.1.1/bin/nvcc --generate-dependencies-with-compile --dependency-output /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o.d -I/path/to/code/venv/lib/python3.7/site-packages/torch/include -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/TH -I/path/to/code/venv/lib/python3.7/site-packages/torch/include/THC -I/cm/shared/apps/cuda11.1/toolkit/11.1.1/include -I/path/to/code/venv/include/python3.7m -c -c /path/to/code/cuda/test/test_cuda.cu -o /path/to/code/cuda/test/build/temp.linux-x86_64-3.7/test_cuda.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=cuda_test -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/arithmetic.h(256): error: identifier "FLT_MIN" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/arithmetic.h(274): error: identifier "DBL_MIN" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(190): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(228): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(243): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(293): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(406): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(498): error: identifier "DBL_MAX" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(562): error: identifier "DBL_MAX_EXP" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(565): error: identifier "DBL_MANT_DIG" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrig.h(630): error: identifier "DBL_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(119): error: identifier "FLT_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(137): error: identifier "FLT_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(147): error: identifier "FLT_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(170): error: identifier "FLT_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(249): error: identifier "FLT_EPSILON" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(327): error: identifier "FLT_MAX" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(375): error: identifier "FLT_MAX_EXP" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(377): error: identifier "FLT_MANT_DIG" is undefined
/cm/shared/apps/cuda11.1/toolkit/11.1.1/include/thrust/detail/complex/catrigf.h(420): error: identifier "FLT_EPSILON" is undefined

我还编写了一个简单的测试代码来验证我较大的CPP/CUDA代码不是罪魁祸首,这会产生相同的错误消息。我还检查了算术.h和catrig.h是否包括<cfloat>,其应提供{FLT,DBL}_{MIN、MAX、EPSILON、MANT_DIG}定义,但这看起来很正常,因为它是标准的NVIDIA代码。如果有人遇到过类似的问题或知道解决方案,请告诉我。

----更新----

以下是我尝试过的另外几件事:

  1. 当我使用CUDA10.1、pytorch 1.4.0、gcc 9.3.0和Ubuntu 20.04时,CUDA代码会编译
  2. 使用pytorch 1.5.1会产生以下错误:/usr/include/c++/9/bits/stl_function.h(437): error: identifier "__builtin_is_constant_evaluated" is undefined但这可以通过将gcc降级到7.5版本来解决
  3. 使用pytorch 1.6.0或更高版本总是会导致一开始报告的错误,即使在使用gcc-7时也是如此

我发现了这个问题。"英特尔MKL"模块加载错误。在修复了这个问题之后,CUDA 11.1和pytorch 1.8.1的编译也很好!

最新更新