我编写了一个基于c语言的应用程序,除了输入非常大的数据集之外,它运行得很好。
对于大的输入,我在二进制功能的最后步骤得到分割错误。
我用valgrind
运行二进制(带有测试输入):
valgrind --tool=memcheck --leak-check=yes /foo/bar/baz inputDataset > outputAnalysis
这项工作通常需要几个小时,但valgrind
花了七天。
不幸的是,在这一点上,我不知道如何读取我从这次运行中得到的结果。
我得到了很多这样的警告:
...
==4074== Conditional jump or move depends on uninitialised value(s)
==4074== at 0x435900: ??? (in /foo/bar/baz)
==4074== by 0x439CC5: ??? (in /foo/bar/baz)
==4074== by 0x400BF2: ??? (in /foo/bar/baz)
==4074== by 0x402086: ??? (in /foo/bar/baz)
==4074== by 0x402A0F: ??? (in /foo/bar/baz)
==4074== by 0x41684F: ??? (in /foo/bar/baz)
==4074== by 0x4001B8: ??? (in /foo/bar/baz)
==4074== by 0x7FEFFFF57: ???
==4074== Uninitialised value was created
==4074== at 0x461D3A: ??? (in /foo/bar/baz)
==4074== by 0x43F926: ??? (in /foo/bar/baz)
==4074== by 0x416B9B: ??? (in /foo/bar/baz)
==4074== by 0x416725: ??? (in /foo/bar/baz)
==4074== by 0x4001B8: ??? (in /foo/bar/baz)
==4074== by 0x7FEFFFF57: ???
...
没有暗示代码的部分,没有变量的名称等。我可以用这些信息做什么?
最后,我终于得到以下错误,但是-与较小的数据集不崩溃- valgrind
发现没有泄漏:
...
==4074== Process terminating with default action of signal 11 (SIGSEGV)
==4074== Access not within mapped region at address 0x7158E7F7
==4074== at 0x7158E7F7: ???
==4074== by 0x4020B8: ??? (in /foo/bar/baz)
==4074== by 0x6322203A22656D6E: ???
==4074== by 0x306C675F6E557267: ???
==4074== by 0x202C22373232302F: ???
==4074== by 0x6D616E656C696621: ???
==4074== by 0x72686322203A2264: ???
==4074== by 0x3030306C675F6E54: ???
==4074== by 0x346469702E373231: ???
==4074== by 0x646469662E34372F: ???
==4074== by 0x722E64616568656B: ???
==4074== by 0x63656D6F6C756764: ???
==4074== If you believe this happened as a result of a stack
==4074== overflow in your program's main thread (unlikely but
==4074== possible), you can try to increase the size of the
==4074== main thread stack using the --main-stacksize= flag.
==4074== The main thread stack size used in this run was 10485760.
==4074==
==4074== HEAP SUMMARY:
==4074== in use at exit: 0 bytes in 0 blocks
==4074== total heap usage: 0 allocs, 0 frees, 0 bytes allocated
==4074==
==4074== All heap blocks were freed -- no leaks are possible
==4074==
==4074== For counts of detected and suppressed errors, rerun with: -v
==4074== ERROR SUMMARY: 1603141870 errors from 86 contexts (suppressed: 0 from 0)
Segmentation fault
我分配空间的所有内容都获得一个等效的free
语句,之后我设置指向NULL
的指针。
22 Dec 2011 - Edit
我编译了一个调试版本的二进制文件,名为debug-binary
,使用以下编译标志:
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE=1 -DUSE_ZLIB -g -O0 -Wformat -Wall -pedantic -std=gnu99
当我用valgrind
运行它时,我没有得到更多的信息:
valgrind -v --tool=memcheck --leak-check=yes --error-limit=no --track-origins=yes debug-binary input > output
下面是输出的一个片段:
==25116== 2 errors in context 14 of 14:
==25116== Invalid read of size 4
==25116== at 0x4045E8: ??? (in /foo/bar/debug-binary)
==25116== by 0x40682F: ??? (in /foo/bar/debug-binary)
==25116== by 0x404F0C: ??? (in /foo/bar/debug-binary)
==25116== by 0x401FA4: ??? (in /foo/bar/debug-binary)
==25116== by 0x402016: ??? (in /foo/bar/debug-binary)
==25116== by 0x403B27: ??? (in /foo/bar/debug-binary)
==25116== by 0x40295E: ??? (in /foo/bar/debug-binary)
==25116== by 0x31A021D993: (below main) (in /lib64/libc-2.5.so)
==25116== Address 0x539f188 is 24 bytes inside a block of size 48 free'd
==25116== at 0x4A05D21: free (vg_replace_malloc.c:325)
==25116== by 0x401F6B: ??? (in /foo/bar/debug-binary)
==25116== by 0x402016: ??? (in /foo/bar/debug-binary)
==25116== by 0x403B27: ??? (in /foo/bar/debug-binary)
==25116== by 0x40295E: ??? (in /foo/bar/debug-binary)
==25116== by 0x31A021D993: (below main) (in /lib64/libc-2.5.so)
这是我的二进制文件的问题,还是我的应用程序依赖的系统库(libc
)的问题?
我也不知道如何解释???
条目。是否有另一个编译标志,我需要得到valgrind
提供更多的信息?
Valgrind基本上说没有明显的堆管理问题。程序从一个不太复杂的编程错误中分离出来。
如果是我,我就会
- 用
gcc -g
编译, - 启用核心转储文件(
ulimit -c unlimited
), - 正常运行程序,
- ,让它出错
- 使用
gdb
检查核心文件并查看它在发生故障时正在做什么:gdb (programfile) (corefile)
bt
我不相信valgrind能够找到堆栈上超出值的所有错误(但不会超出堆栈本身)。因此,您可能想要尝试gcc的-f-stack-protector-all
选项。
你也应该尝试mudflap, -fmudflap
(单线程)或-fmudflapth
(多线程)。
挡泥板和堆栈保护器应该比valgrind快得多。
另外,看起来好像没有调试符号,使得读取回溯很困难。添加-ggdb
。您可能还想启用核心文件生成(试试ulimit -c unlimited
)。这样,您就可以尝试使用gdb program core
在崩溃后调试进程。
如@wallyk所示,您的段错误实际上可能是相当容易找到的东西——例如:,也许您正在解引用NULL, gdb可以将您指向准确的行(或者,除非使用-O0
进行编译,否则会关闭)。这是有意义的,例如,如果你只是为你的大数据集运行内存,因此malloc返回NULL,而你忘记在某处检查它。
最后,如果没有其他合理的方法,那么总是存在硬件问题的可能性。但这些可能是相当随机的,例如,不同的值在不同的运行中被破坏。
"条件跳转或移动取决于未初始化的值"是一个需要修复的严重错误。它表明程序的行为受到未初始化变量(包括malloc()
返回的未初始化内存区域)内容的影响。
要从valgrind获得可读的回溯,您需要使用-g
进行编译。