我一直在尝试调试导致分段故障的服务问题。我没有访问生产服务器的权限,所以我在服务中处理了SIGSEGV信号,并在日志文件中打印了stacktrace。以下是服务崩溃时的堆栈竞争
0# 0x00000000005054DA in ./afiniti_lookup
1# 0x00007F2BBB74A400 in /usr/lib64/libc.so.6
2# 0x00007F2BBB86F9BD in /usr/lib64/libc.so.6
3# 0x000000000041BB52 in ./afiniti_lookup
4# std::string::_M_move(char*, char const*, unsigned long) in ./afiniti_lookup
5# std::string::_M_mutate(unsigned long, unsigned long, unsigned long) in ./afiniti_lookup
6# std::string::_M_replace_safe(unsigned long, unsigned long, char const*, unsigned long) in ./afiniti_lookup
7# std::string::assign(char const*, unsigned long) in ./afiniti_lookup
8# std::string::assign(char const*) in ./afiniti_lookup
9# std::string::operator=(char const*) in ./afiniti_lookup
10# 0x000000000061E8E9 in ./afiniti_lookup
11# 0x0000000000620200 in ./afiniti_lookup
12# 0x000000000055B586 in ./afiniti_lookup
13# 0x00000000004F2BAC in ./afiniti_lookup
14# 0x00000000004F0715 in ./afiniti_lookup
15# 0x000000000051CDBF in ./afiniti_lookup
16# 0x0000000000529869 in ./afiniti_lookup
17# 0x0000000000464968 in ./afiniti_lookup
18# 0x0000000000461369 in ./afiniti_lookup
19# 0x0000000000460D6E in ./afiniti_lookup
20# 0x0000000000460086 in ./afiniti_lookup
21# 0x000000000045FD36 in ./afiniti_lookup
22# 0x000000000046CAB4 in ./afiniti_lookup
23# 0x000000000046B4F6 in ./afiniti_lookup
24# 0x000000000046FF13 in ./afiniti_lookup
25# 0x000000000046FE65 in ./afiniti_lookup
26# 0x000000000046FCDA in ./afiniti_lookup
27# 0x00007F2BBCE5038F in /opt/lib64/libcpprest.so.2.10
28# 0x00007F2BBEDCAEA5 in /usr/lib64/libpthread.so.0n29# clone in /usr/lib64/libc.so.6
但是,这个跟踪没有多大用处,因为我无法在代码中精确定位问题发生的位置。有人能帮我更好地理解和检查这个堆垛机吗?
有人能帮我更好地理解和检查这个堆栈吗?
看起来您在生产中有一个部分剥离的可执行文件。
您应该有一个未剥离的副本(通过链接可执行文件生成(。如果你不这样做,你需要改变你的方式,并在strip
之前保存一份副本。
有了一个未撕裂的副本,你可以像这样理解你的堆栈跟踪:
addr2line -fe afiniti_lookup.unstripped 0x61E8E9 0x620200 0x55B586 ...
以下是示例输出:
cat foo.c
int foo() { int *ip = 0; return *ip; }
int bar() { return foo(); }
int zoo() { return bar(); }
int main() { return zoo(); }
使用调试信息编译:gcc -g foo.c
(生成a.out
(
剥去"的二进制;生产":strip --strip=all a.out -o b.out
。
在GDB下运行b.out
以模拟生产堆栈跟踪:
(gdb) run
Starting program: /tmp/b.out
Program received signal SIGSEGV, Segmentation fault.
0x0000000000401112 in ?? ()
(gdb) bt
#0 0x0000000000401112 in ?? ()
#1 0x0000000000401124 in ?? ()
#2 0x0000000000401134 in ?? ()
#3 0x0000000000401144 in ?? ()
#4 0x00007ffff7dfbcca in __libc_start_main (main=0x401136, argc=1, argv=0x7fffffffdc98, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdc88) at ../csu/libc-start.c:308
#5 0x000000000040104a in ?? ()
现在在未撕裂二进制文件上使用addr2line
来理解上面的堆栈跟踪:
addr2line -fe a.out 0x0000000000401112 0x0000000000401124 0x0000000000401134 0x0000000000401144
foo
/tmp/foo.c:1
bar
/tmp/foo.c:2
zoo
/tmp/foo.c:3
main
/tmp/foo.c:4
附言:对于实际生产使用,理想情况下,您应该使用gcc -O2 -g ...
编译二进制文件,这样您就有完整的调试信息,然后使用strip
二进制文件(但保留完整的调试副本(。这样,您就可以通过访问函数、文件、行和变量,非常容易地从生产中调试核心转储。