为什么/usr/bin/timeout会杀死整个管道



我有一个程序可以无限期地生成输出。我想对这个输出进行一秒钟的采样,然后用管道传输到gzip中。我使用timeoututil来限制执行,但问题是gzip也会被杀死。

例如:

$ /usr/bin/timeout 1 bash -c "echo asdf; sleep 5" | gzip > /tmp/foo.gz; ls -lah /tmp/foo.gz 
Terminated
-rw-rw-r-- 1 haizaar haizaar 0 Jul 22 15:05 /tmp/foo.gz

您可以看到,gzip命令是Terminated,因此它的输出会导致一个空文件(由于丢失了缓冲区(

我不明白timeout是如何杀死一个读取其stdout的进程的;以及如何修复它。即使将整个东西包装在另一个bash中,结果也是一样的:

$ bash -c '/usr/bin/timeout 1 bash -c "echo asdf; sleep 5"' | gzip > /tmp/foo.gz; ls -lah /tmp/foo.gz
Terminated
-rw-rw-r-- 1 haizaar haizaar 0 Jul 22 15:30 /tmp/foo.gz

我可以用setsid预处理timeout,然后它就工作了,这让我认为它在某种程度上与处理组混淆有关,但很难接受这样一个事实,即当前的情况是";通过设计";因为它使得timeout命令在与shell管道一起使用时非常棘手。

环境:

$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"
$ bash --version
GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ timeout --version
timeout (GNU coreutils) 8.30
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Padraig Brady.

更新KamilCuk的strace分析非常准确。它还解释了将timeout封装在另一个bash中也没有帮助——看起来bash有一个优化,如果它只有一个要运行的命令,它就不会替换fork,而是替换exec。但是,如果在包装bash中添加另一个命令,那么它将分叉,从而创建一个新的进程组,从而限制timeout命令的爆炸半径。即

bash -c 'true; /usr/bin/timeout 1 bash -c "echo asdf; sleep 5"' | gzip > /tmp/foo.gz

(注意前面的true(

我仍然认为在管道中使用timeout是一种魔术,但那是另一回事。

$ strace -ff -e trace=setpgid,kill,exit_group,exit,execve,wait4 bash --norc --noprofile -ic "timeout -v 1 bash --norc --noprofile -c 'echo asdf ; sleep 5' | { sleep 2; echo 123; }" 
execve("/usr/bin/bash", ["bash", "--norc", "--noprofile", "-ic", "timeout -v 1 bash --norc --nopro"...], 0x7ffeb8ef7ef8 /* 76 vars */) = 0
setpgid(0, 28995)                       = 0
strace: Process 28996 attached
[pid 28995] setpgid(28996, 28996)       = 0
[pid 28996] setpgid(28996, 28996)       = 0
strace: Process 28997 attached
[pid 28995] setpgid(28997, 28996)       = 0
[pid 28995] wait4(-1,  <unfinished ...>
[pid 28997] setpgid(28997, 28996)       = 0
[pid 28996] execve("/usr/bin/timeout", ["timeout", "-v", "1", "bash", "--norc", "--noprofile", "-c", "echo asdf ; sleep 5"], 0x560da0ff57e0 /* 76 vars */strace: Process 28998 attached
) = 0
[pid 28997] wait4(-1,  <unfinished ...>
[pid 28998] execve("/usr/bin/sleep", ["sleep", "2"], 0x560da0ff57e0 /* 76 vars */) = 0
[pid 28996] setpgid(0, 0)               = 0
strace: Process 28999 attached
[pid 28996] wait4(28999, 0x7ffd7eb5e96c, WNOHANG, NULL) = 0
[pid 28999] execve("/usr/local/bin/bash", ["bash", "--norc", "--noprofile", "-c", "echo asdf ; sleep 5"], 0x7ffd7eb5ec10 /* 76 vars */) = -1 ENOENT (No such file or directory)
[pid 28999] execve("/usr/bin/bash", ["bash", "--norc", "--noprofile", "-c", "echo asdf ; sleep 5"], 0x7ffd7eb5ec10 /* 76 vars */) = 0
[pid 28999] execve("/usr/bin/sleep", ["sleep", "5"], 0x55a84be27270 /* 76 vars */) = 0
[pid 28996] --- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_int=0, si_ptr=NULL} ---
timeout: sending signal TERM to command ‘bash’
[pid 28996] kill(28999, SIGTERM)        = 0
[pid 28999] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28996] kill(0, SIGTERM <unfinished ...>
[pid 28997] <... wait4 resumed>0x7ffc114a9600, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid 28996] <... kill resumed>)         = 0
[pid 28999] +++ killed by SIGTERM +++
[pid 28998] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28997] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28996] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28996] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=28999, si_uid=1000, si_status=SIGTERM, si_utime=0, si_stime=0} ---
[pid 28997] +++ killed by SIGTERM +++
[pid 28995] <... wait4 resumed>[{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], WSTOPPED|WCONTINUED, NULL) = 28997
[pid 28998] +++ killed by SIGTERM +++
[pid 28995] wait4(-1,  <unfinished ...>
[pid 28996] kill(28999, SIGCONT)        = 0
[pid 28996] kill(0, SIGCONT)            = 0
[pid 28996] --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28996] wait4(28999, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], WNOHANG, NULL) = 28999
[pid 28996] exit_group(124)             = ?
[pid 28996] +++ exited with 124 +++
<... wait4 resumed>[{WIFEXITED(s) && WEXITSTATUS(s) == 124}], WSTOPPED|WCONTINUED, NULL) = 28996
Terminated
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=28997, si_uid=1000, si_status=SIGTERM, si_utime=0, si_stime=0} ---
wait4(-1, 0x7ffc114a9710, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)
setpgid(0, 28992)                       = 0
exit_group(143)                         = ?
+++ exited with 143 +++

因此,timeout试图变得聪明,并杀死了整个流程组。据我所知,

  • bash为管道setpgid(28996, 28996)创建一个进程组
  • timeout启动同一组setpgid(0, 0)中的进程
  • 超时后CCD_ 20杀死整个进程组CCD_
  • 因为所有管道进程都在同一个进程组中,所以所有进程都被终止

您可以使用命令分组{ ... }使bash为左侧启动一个新的进程组。

您可以使用timeout --foreground,但是timeout终止前台进程。因此,当bash将死亡时,gzip进程仍将等待后台运行的sleep 5,因为它将打开stdin

猜测(同样来自提交消息(我认为这可能是timeout可以杀死整个管道的意图,就像它是一个内置的魔法外壳一样。

此外,启用作业控制和禁用作业控制的行为不同,因此交互shell和非交互shell的行为也不同。

相关内容

  • 没有找到相关文章

最新更新