背景
我有一些非常复杂的申请。它是由两个库组成的。现在QA团队发现了一些问题(有些报告错误)
Fromm日志我可以看到应用程序正在泄漏一个文件描述符(经过7个小时的自动测试后,+1000)。QA团队已经从"活动监视器"中提供了"打开的文件和端口",我确切地知道哪个服务器连接没有关闭。
从完整的应用程序日志中,我可以看到泄漏是非常系统的(并没有突然爆发),但我无法重现问题,甚至无法看到文件描述符的小泄漏。
问题
即使你我确信哪个服务器连接从未关闭,我也找不到负责的代码。我无法复制这个问题
在日志中,我可以看到我的库维护的所有资源都被正确释放了,但服务器地址仍然表明这是我的责任或NSURLSession
(无效)。
由于还有其他库和应用程序代码,所以泄漏由第三方代码引起的可能性很小。
问题
如何定位导致文件描述符泄漏的代码?最好的候选者是使用看起来非常有前景的dtruss
。从文档中我可以看到,当使用系统API时,它可以打印堆栈回溯-s
问题是我不知道如何使用它,这样我就不会被信息淹没。我只需要谁创建了打开的文件描述符,如果它是关闭销毁的信息。由于我不能重现这个问题,我需要一个可以由QA团队运行的脚本,这样他们就可以给我一个输出。
如果有其他方法可以找到文件描述符泄漏的来源,请告诉我。
有一堆预定义的脚本正在使用dtruss
,但我看不到任何符合我需求的内容。
最后注释
奇怪的是,我所知道的唯一代码是使用有问题的连接,不直接使用文件描述符,而是使用自定义NSURLSession
(配置为:每个主机一个连接,最低TLS 1.0,禁用cookie,自定义证书验证)。从日志中,我可以看到NSURLSession
已正确失效。我怀疑NSURLSession
是泄漏源,但目前这是唯一的候选者。
好吧,我找到了如何做到这一点——无论如何,在Solaris11上。我得到了这个输出(是的,我需要Solaris 11上的root
):
bash-4.1# dtrace -s fdleaks.d -c ./fdLeaker
open( './fdLeaker' ) returned 3
open( './fdLeaker' ) returned 4
open( './fdLeaker' ) returned 5
falloc fp: ffffa1003ae56590, fd: 3, saved fd: 3
falloc fp: ffffa10139d28f58, fd: 4, saved fd: 4
falloc fp: ffffa10030a86df0, fd: 5, saved fd: 5
opened file: ./fdLeaker
leaked fd: 3
libc.so.1`__systemcall+0x6
libc.so.1`__open+0x29
libc.so.1`open+0x84
fdLeaker`main+0x2b
fdLeaker`_start+0x72
opened file: ./fdLeaker
leaked fd: 4
libc.so.1`__systemcall+0x6
libc.so.1`__open+0x29
libc.so.1`open+0x84
fdLeaker`main+0x64
fdLeaker`_start+0x72
查找泄漏文件描述符的fdleaks.d
dTrace脚本:
#!/usr/sbin/dtrace
/* this will probably need tuning
note there can be significant performance
impacts if you make these large */
#pragma D option nspec=4
#pragma D option specsize=128k
#pragma D option quiet
syscall::open*:entry
/ pid == $target /
{
/* arg1 might not have a physical mapping yet so
we can't call copyinstr() until open() returns
and we don't have a file descriptor yet -
we won't get that until open() returns anyway */
self->path = arg1;
}
/* arg0 is the file descriptor being returned */
syscall::open*:return
/ pid == $target && arg0 >= 0 && self->path /
{
/* get a speculation ID tied to this
file descriptor and start speculative
tracing */
openspec[ arg0 ] = speculation();
speculate( openspec[ arg0 ] );
/* this output won't appear unless the associated
speculation id is commited */
printf( "nopened file: %sn", copyinstr( self->path ) );
printf( "leaked fd: %dnn", arg0 );
ustack();
/* free the saved path */
self->path = 0;
}
syscall::close:entry
/ pid == $target && arg0 >= 0 /
{
/* closing the fd, so discard the speculation
and free the id by setting it to zero */
discard( openspec[ arg0 ] );
openspec[ arg0 ] = 0;
}
/* Solaris uses falloc() to open a file and associate
the fd with an internal file_t structure
When the kernel closes file descriptors that the
process left open, it uses the closeall() function
which walks the internal structures then calls
closef() using the file_t *, so there's no way
to get the original process file descritor in
closeall() or closef() dTrace probes.
falloc() is called on open() to associate the
file_t * with a file descriptor, so this
saves the pointers passed to falloc()
that are used to return the file_t * and
file descriptor once they're filled in
when falloc() returns */
fbt::falloc:entry
/ pid == $target /
{
self->fpp = args[ 2 ];
self->fdp = args[ 3 ];
}
/* Clause-local variables to make casting clearer */
this int fd;
this uint64_t fp;
/* array to associate a file descriptor with its file_t *
structure in the kernel */
int fdArray[ uint64_t fp ];
fbt::falloc:return
/ pid == $target && self->fpp && self->fdp /
{
/* get the fd and file_t * values being
returned to the caller */
this->fd = ( * ( int * ) self->fdp );
this->fp = ( * ( uint64_t * ) self->fpp );
/* associate the fd with its file_t * */
fdArray[ this->fp ] = ( int ) this->fd;
/* verification output */
printf( "falloc fp: %x, fd: %d, saved fd: %dn", this->fp, this->fd, fdArray[ this->fp ] );
}
/* if this gets called and the dereferenced
openspec array element is a still-valid
speculation id, the fd associated with
the file_t * passed to closef() was never
closed by the process itself */
fbt::closef:entry
/ pid == $target /
{
/* commit the speculative tracing since
this file descriptor was leaked */
commit( openspec[ fdArray[ arg0 ] ] );
}
首先,我写了这个小C程序来泄漏fds:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
int main( int argc, char **argv )
{
int ii;
for ( ii = 0; ii < argc; ii++ )
{
int fd = open( argv[ ii ], O_RDONLY );
fprintf( stderr, "open( '%s' ) returned %dn", argv[ ii ], fd );
fd = open( argv[ ii ], O_RDONLY );
fprintf( stderr, "open( '%s' ) returned %dn", argv[ ii ], fd );
fd = open( argv[ ii ], O_RDONLY );
fprintf( stderr, "open( '%s' ) returned %dn", argv[ ii ], fd );
close( fd );
}
return( 0 );
}
然后,我在这个dTrace脚本下运行它,以了解内核如何关闭孤立的文件描述符dtrace -s exit.d -c ./fdLeaker
:
#!/usr/sbin/dtrace -s
#pragma D option quiet
syscall::rexit:entry
{
self->exit = 1;
}
syscall::rexit:return
/ self->exit /
{
self->exit = 0;
}
fbt:::entry
/ self->exit /
{
printf( "---> %sn", probefunc );
}
fbt:::return
/ self->exit /
{
printf( "<--- %sn", probefunc );
}
这产生了很多输出,我注意到了closeall()
和closef()
函数,检查了源代码,并编写了dTrace脚本。
还请注意,Solaris 11上的进程出口dTrace探测器是rexit
探测器,它可能在OSX上发生变化。
Solaris上最大的问题是在关闭孤立文件描述符的内核代码中获取文件的文件描述符。Solaris不通过文件描述符关闭,而是通过进程的内核打开文件结构中的struct file_t
指针关闭。因此,我必须检查Solaris源代码,以找出fd与file_t *
的关联位置,即falloc()
函数中的关联位置。dTrace脚本将file_t *
与其关联数组中的fd相关联。
这些都不可能在OSX上起作用。
如果幸运的话,OSX内核将通过文件描述符本身关闭孤立的文件描述符,或者至少提供一些告诉fd正在关闭的东西,也许是一个审计函数。