netCDF和HDF5库冲突导致的段故障



更新:我已经找到了部分解决方案。请看这篇文章的底部。

经过几个小时的调试程序,我发现netCDF和HDF5库之间存在某种冲突(程序读取/写入两种格式的文件)。

我已经将代码简化为一个显示问题的小程序。这个程序分段错误:

#include <iostream>
#include <string>
#include "H5Cpp.h"
#include <netcdf>
using namespace std;
void stupidfunction() // Note that this is never called.
{
H5::Group grp1; // The mere potential existence of this makes netcdf segfault!
}
int main(int argn, char ** args)
{
std::string outputFilename = "/tmp/test.nc";
try
{   
std::cout << "Now opening " << outputFilename << std::endl;
netCDF::NcFile sfc;
sfc.open(outputFilename, netCDF::NcFile::replace);
std::cout << "closing file" << std::endl;
sfc.close();
return true;
}
catch(netCDF::exceptions::NcException& e)
{
std::cout << "EX: " << e.what() << std::endl;
return false;
}

return 0;
}

(编译命令:h5c++ test.cpp -std=gnu++11 - 0 -g3 -lnetcdf_c++4 - lnetcddf -o test)

我安装的(相关)包:

libnetcdf-c++4-1                       4.3.1-2build1                         amd64        C++ interface for scientific data access to large binary data
libnetcdf-c++4-dev                     4.3.1-2build1                         amd64        creation, access, and sharing of scientific data in C++
libnetcdf-dev                          1:4.7.3-1                             amd64        creation, access, and sharing of scientific data
libnetcdf15:amd64                      1:4.7.3-1                             amd64        Interface for scientific data access to large binary data
netcdf-bin                             1:4.7.3-1                             amd64        Programs for reading and writing NetCDF files
netcdf-doc                             1:4.7.3-1                             all          Documentation for NetCDF
hdf5-helpers                           1.10.4+repack-11ubuntu1               amd64        Hierarchical Data Format 5 (HDF5) - Helper tools
hdf5-tools                             1.10.4+repack-11ubuntu1               amd64        Hierarchical Data Format 5 (HDF5) - Runtime tools
libhdf4-0                              4.2.14-1ubuntu1                       amd64        Hierarchical Data Format library (embedded NetCDF)
libhdf5-103:amd64                      1.10.4+repack-11ubuntu1               amd64        Hierarchical Data Format 5 (HDF5) - runtime files - serial version
libhdf5-cpp-103:amd64                  1.10.4+repack-11ubuntu1               amd64        Hierarchical Data Format 5 (HDF5) - C++ libraries
libhdf5-dev                            1.10.4+repack-11ubuntu1               amd64        Hierarchical Data Format 5 (HDF5) - development files - serial versio

是什么导致了段错误?我能做些什么来避免或解决这个问题吗?任何帮助都是感激的!

使用gdb运行时:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7a47a01 in __vfprintf_internal (s=s@entry=0x7fffff7ff480, 
format=format@entry=0x7ffff77e13a8 "can't locate ID", 
ap=ap@entry=0x7fffff7ff5e0, mode_flags=mode_flags@entry=2)
at vfprintf-internal.c:1289
1289    vfprintf-internal.c: No such file or directory.

gdb回溯:

(gdb) bt
#0  0x00007ffff7a47a01 in __vfprintf_internal (s=s@entry=0x7fffff7ff480, format=format@entry=0x7ffff77e13a8 "can't locate ID", 
ap=ap@entry=0x7fffff7ff5e0, mode_flags=mode_flags@entry=2) at vfprintf-internal.c:1289
#1  0x00007ffff7a5cd4a in __vasprintf_internal (result_ptr=0x7fffff7ff5d8, format=0x7ffff77e13a8 "can't locate ID", args=0x7fffff7ff5e0, mode_flags=2)
at vasprintf.c:57
#2  0x00007ffff75c7e56 in H5E_printf_stack () from /usr/lib/x86_64-linux-gnu/libhdf5_serial.so.103
#3  0x00007ffff76553b9 in H5I_inc_ref () from /usr/lib/x86_64-linux-gnu/libhdf5_serial.so.103
...
many many lines repeating H5E_printf_stack, H5E__push_stack and H5I_inc_ref
...
#56139 0x00007ffff76553b9 in H5I_inc_ref () from /usr/lib/x86_64-linux-gnu/libhdf5_serial.so.103
#56140 0x00007ffff75c7c2f in H5E__push_stack () from /usr/lib/x86_64-linux-gnu/libhdf5_serial.so.103
#56141 0x00007ffff75c7e7e in H5E_printf_stack () from /usr/lib/x86_64-linux-gnu/libhdf5_serial.so.103
#56142 0x00007ffff761fc85 in H5G_loc () from /usr/lib/x86_64-linux-gnu/libhdf5_serial.so.103
#56143 0x00007ffff7546903 in H5Acreate1 () from /usr/lib/x86_64-linux-gnu/libhdf5_serial.so.103
#56144 0x00007ffff790b11b in NC4_write_provenance () from /usr/lib/x86_64-linux-gnu/libnetcdf.so.15
#56145 0x00007ffff790b5a8 in ?? () from /usr/lib/x86_64-linux-gnu/libnetcdf.so.15
#56146 0x00007ffff790b7b0 in nc4_close_hdf5_file () from /usr/lib/x86_64-linux-gnu/libnetcdf.so.15
#56147 0x00007ffff790b9ea in NC4_close () from /usr/lib/x86_64-linux-gnu/libnetcdf.so.15
#56148 0x00007ffff78ca579 in nc_close () from /usr/lib/x86_64-linux-gnu/libnetcdf.so.15
#56149 0x00007ffff7f82270 in netCDF::NcFile::close() () from /usr/lib/x86_64-linux-gnu/libnetcdf_c++4.so.1
#56150 0x00005555555a7959 in main (argn=1, args=0x7fffffffe5b8) at test.cpp:29
(gdb) 

部分处理:如果我指定netCDF文件格式为classic或classic64,则不会发生错误。即:

sfc.open(outputFilename, netCDF::NcFile::replace, netCDF::NcFile::classic);

sfc.open(outputFilename, netCDF::NcFile::replace, netCDF::NcFile::classic64);

我有一个C程序显示类似的行为,我发现添加

#include <H5public.h>
: 
if (H5dont_atexit() < 0)
{
fprintf(stderr, "failed HDF5 don't-atexitn");
return 1;
}
main()

开头的修复了这个问题。这确实意味着你H5Fopen()的文件不会自动被H5Fclose()-ed,但可能是一个影响较小的解决方案。

最新更新