Python中带有CUDA的OpenCV DNN YOLO V4比C++中的相同代码快5倍



我正在尝试比较以下代码的性能:

frames = ...
for i in range(2000):
frame = frames[i % 4]
model.detect(frame, .2, .4)

以及C++中的对应程序:

frames = ...
for(int i = 0; i < 2000; ++i) {
const cv::Mat & frame = frames[i % 4];
model.detect(frame, classIds, confidences, boxes, .2, .4);
}

在C++中,CUDA的后端设置为:

net.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA);
net.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA_FP16);

在Python中,CUDA的后端由以下设置:

net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA_FP16)

令人惊讶的是,我发现Python程序运行得更快,在300 FPS的速度下执行,而等效的C++程序在60 FPS的速度上运行。在两次执行中,CUDA都处于启用状态。

关于我的环境的代码和详细信息可以在我为该分析创建的存储库中进行检查。

值得注意的是,仅使用CPU(禁用CUDA(再次运行程序会导致C++和Python版本的60 FPS。这让我认为OpenCV实际上并没有将CUDA用于cv::dnnAPI。事实上,C++程序消耗了6%的GPU,而Python程序使用了67%的GPU。

有人已经发现类似的情况了吗?或者另一方面,当用CUDA运行代码时,我在C++端做错了什么?

编辑:cv::getBuildInformation()的输出为:

General configuration for OpenCV 4.5.3 =====================================
Version control:               4.5.3
Extra modules:
Location (extra):            /home/doleron/opencv_build/opencv_contrib/modules
Version control (extra):     4.5.3
Platform:
Timestamp:                   2022-01-16T16:15:55Z
Host:                        Linux 5.11.0-46-generic x86_64
CMake:                       3.16.3
CMake generator:             Unix Makefiles
CMake build tool:            /usr/bin/make
Configuration:               RELEASE
CPU/HW features:
Baseline:                    SSE SSE2 SSE3
requested:                 SSE3
Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
SSE4_1 (15 files):         + SSSE3 SSE4_1
SSE4_2 (1 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
FP16 (0 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
AVX (4 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
AVX2 (29 files):           + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
AVX512_SKX (4 files):      + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX
C/C++:
Built as dynamic libs?:      YES
C++ standard:                11
C++ Compiler:                /usr/bin/c++  (ver 9.3.0)
C++ flags (Release):         -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG  -DNDEBUG
C++ flags (Debug):           -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -g  -O0 -DDEBUG -D_DEBUG
C Compiler:                  /usr/bin/cc
C flags (Release):           -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -O3 -DNDEBUG  -DNDEBUG
C flags (Debug):             -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -g  -O0 -DDEBUG -D_DEBUG
Linker flags (Release):      -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a   -Wl,--gc-sections -Wl,--as-needed  
Linker flags (Debug):        -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a   -Wl,--gc-sections -Wl,--as-needed  
ccache:                      NO
Precompiled headers:         NO
Extra dependencies:          m pthread cudart_static dl rt nppc nppial nppicc nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cudnn cufft -L/usr/local/cuda/lib64 -L/usr/lib/x86_64-linux-gnu
3rdparty dependencies:
OpenCV modules:
To be built:                 aruco barcode bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann freetype fuzzy gapi hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto
Disabled:                    world
Disabled by dependency:      -
Unavailable:                 alphamat cvv hdf java julia matlab ovis python2 sfm ts viz
Applications:                apps
Documentation:               NO
Non-free algorithms:         YES
GUI: 
GTK+:                        YES (ver 3.24.20)
GThread :                  YES (ver 2.64.6)
GtkGlExt:                  NO
VTK support:                 NO
Media I/O: 
ZLib:                        /usr/lib/x86_64-linux-gnu/libz.so (ver 1.2.11)
JPEG:                        /usr/lib/x86_64-linux-gnu/libjpeg.so (ver 80)
WEBP:                        build (ver encoder: 0x020f)
PNG:                         /usr/lib/x86_64-linux-gnu/libpng.so (ver 1.6.37)
TIFF:                        /usr/lib/x86_64-linux-gnu/libtiff.so (ver 42 / 4.1.0)
JPEG 2000:                   build (ver 2.4.0)
OpenEXR:                     build (ver 2.3.0)
HDR:                         YES
SUNRASTER:                   YES
PXM:                         YES
PFM:                         YES
Video I/O:
DC1394:                      YES (2.2.5)
FFMPEG:                      YES
avcodec:                   YES (58.54.100)
avformat:                  YES (58.29.100)
avutil:                    YES (56.31.100)
swscale:                   YES (5.5.100)
avresample:                NO
v4l/v4l2:                    YES (linux/videodev2.h)
Parallel framework:            TBB (ver 2020.2 interface 11102)
Trace:                         YES (with Intel ITT)
Other third-party libraries:
Intel IPP:                   2020.0.0 Gold [2020.0.0]
at:                   /home/doleron/opencv_build/opencv/build/3rdparty/ippicv/ippicv_lnx/icv
Intel IPP IW:                sources (2020.0.0)
at:                /home/doleron/opencv_build/opencv/build/3rdparty/ippicv/ippicv_lnx/iw
VA:                          NO
Lapack:                      NO
Eigen:                       NO
Custom HAL:                  NO
Protobuf:                    build (3.5.1)
NVIDIA CUDA:                   YES (ver 11.6, CUFFT CUBLAS FAST_MATH)
NVIDIA GPU arch:             75
NVIDIA PTX archs:
cuDNN:                         YES (ver 8.3.2)
OpenCL:                        YES (no extra features)
Include path:                /home/doleron/opencv_build/opencv/3rdparty/include/opencl/1.2
Link libraries:              Dynamic load
Python 3:
Interpreter:                 /usr/bin/python3 (ver 3.8.10)
Libraries:                   /usr/lib/x86_64-linux-gnu/libpython3.8.so (ver 3.8.10)
numpy:                       /usr/lib/python3/dist-packages/numpy/core/include (ver 1.17.4)
install path:                lib/python3.8/dist-packages/cv2/python-3.8
Python (for build):            /usr/bin/python3
Java:                          
ant:                         NO
JNI:                         NO
Java wrappers:               NO
Java tests:                  NO
Install to:                    /usr/local
-----------------------------------------------------------------

hmm。。根据回购(C++版本(中的数据进行比较,(1( 和下面的(2(非常接近,不知何故,CUDA似乎不能正常工作,两者可能都在CPU上运行?

(1( C++/CUDA:实数=0m33179s;用户=6m14921s;sys=0m6942s

(2( C++/CPU:实数=0m34341s;用户=6m19379s;sys=0m7908s

最新更新