我有一个程序可以无限期地生成输出。我想对这个输出进行一秒钟的采样,然后用管道传输到gzip中。我使用timeout
util来限制执行,但问题是gzip
也会被杀死。
例如:
$ /usr/bin/timeout 1 bash -c "echo asdf; sleep 5" | gzip > /tmp/foo.gz; ls -lah /tmp/foo.gz
Terminated
-rw-rw-r-- 1 haizaar haizaar 0 Jul 22 15:05 /tmp/foo.gz
您可以看到,gzip命令是Terminated
,因此它的输出会导致一个空文件(由于丢失了缓冲区(
我不明白timeout
是如何杀死一个读取其stdout的进程的;以及如何修复它。即使将整个东西包装在另一个bash
中,结果也是一样的:
$ bash -c '/usr/bin/timeout 1 bash -c "echo asdf; sleep 5"' | gzip > /tmp/foo.gz; ls -lah /tmp/foo.gz
Terminated
-rw-rw-r-- 1 haizaar haizaar 0 Jul 22 15:30 /tmp/foo.gz
我可以用setsid
预处理timeout
,然后它就工作了,这让我认为它在某种程度上与处理组混淆有关,但很难接受这样一个事实,即当前的情况是";通过设计";因为它使得timeout
命令在与shell管道一起使用时非常棘手。
环境:
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"
$ bash --version
GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
$ timeout --version
timeout (GNU coreutils) 8.30
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Padraig Brady.
更新KamilCuk的strace分析非常准确。它还解释了将timeout
封装在另一个bash
中也没有帮助——看起来bash有一个优化,如果它只有一个要运行的命令,它就不会替换fork
,而是替换exec
。但是,如果在包装bash
中添加另一个命令,那么它将分叉,从而创建一个新的进程组,从而限制timeout
命令的爆炸半径。即
bash -c 'true; /usr/bin/timeout 1 bash -c "echo asdf; sleep 5"' | gzip > /tmp/foo.gz
(注意前面的true
(
我仍然认为在管道中使用timeout
是一种魔术,但那是另一回事。
$ strace -ff -e trace=setpgid,kill,exit_group,exit,execve,wait4 bash --norc --noprofile -ic "timeout -v 1 bash --norc --noprofile -c 'echo asdf ; sleep 5' | { sleep 2; echo 123; }"
execve("/usr/bin/bash", ["bash", "--norc", "--noprofile", "-ic", "timeout -v 1 bash --norc --nopro"...], 0x7ffeb8ef7ef8 /* 76 vars */) = 0
setpgid(0, 28995) = 0
strace: Process 28996 attached
[pid 28995] setpgid(28996, 28996) = 0
[pid 28996] setpgid(28996, 28996) = 0
strace: Process 28997 attached
[pid 28995] setpgid(28997, 28996) = 0
[pid 28995] wait4(-1, <unfinished ...>
[pid 28997] setpgid(28997, 28996) = 0
[pid 28996] execve("/usr/bin/timeout", ["timeout", "-v", "1", "bash", "--norc", "--noprofile", "-c", "echo asdf ; sleep 5"], 0x560da0ff57e0 /* 76 vars */strace: Process 28998 attached
) = 0
[pid 28997] wait4(-1, <unfinished ...>
[pid 28998] execve("/usr/bin/sleep", ["sleep", "2"], 0x560da0ff57e0 /* 76 vars */) = 0
[pid 28996] setpgid(0, 0) = 0
strace: Process 28999 attached
[pid 28996] wait4(28999, 0x7ffd7eb5e96c, WNOHANG, NULL) = 0
[pid 28999] execve("/usr/local/bin/bash", ["bash", "--norc", "--noprofile", "-c", "echo asdf ; sleep 5"], 0x7ffd7eb5ec10 /* 76 vars */) = -1 ENOENT (No such file or directory)
[pid 28999] execve("/usr/bin/bash", ["bash", "--norc", "--noprofile", "-c", "echo asdf ; sleep 5"], 0x7ffd7eb5ec10 /* 76 vars */) = 0
[pid 28999] execve("/usr/bin/sleep", ["sleep", "5"], 0x55a84be27270 /* 76 vars */) = 0
[pid 28996] --- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_int=0, si_ptr=NULL} ---
timeout: sending signal TERM to command ‘bash’
[pid 28996] kill(28999, SIGTERM) = 0
[pid 28999] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28996] kill(0, SIGTERM <unfinished ...>
[pid 28997] <... wait4 resumed>0x7ffc114a9600, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid 28996] <... kill resumed>) = 0
[pid 28999] +++ killed by SIGTERM +++
[pid 28998] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28997] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28996] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28996] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=28999, si_uid=1000, si_status=SIGTERM, si_utime=0, si_stime=0} ---
[pid 28997] +++ killed by SIGTERM +++
[pid 28995] <... wait4 resumed>[{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], WSTOPPED|WCONTINUED, NULL) = 28997
[pid 28998] +++ killed by SIGTERM +++
[pid 28995] wait4(-1, <unfinished ...>
[pid 28996] kill(28999, SIGCONT) = 0
[pid 28996] kill(0, SIGCONT) = 0
[pid 28996] --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=28996, si_uid=1000} ---
[pid 28996] wait4(28999, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], WNOHANG, NULL) = 28999
[pid 28996] exit_group(124) = ?
[pid 28996] +++ exited with 124 +++
<... wait4 resumed>[{WIFEXITED(s) && WEXITSTATUS(s) == 124}], WSTOPPED|WCONTINUED, NULL) = 28996
Terminated
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=28997, si_uid=1000, si_status=SIGTERM, si_utime=0, si_stime=0} ---
wait4(-1, 0x7ffc114a9710, WNOHANG|WSTOPPED|WCONTINUED, NULL) = -1 ECHILD (No child processes)
setpgid(0, 28992) = 0
exit_group(143) = ?
+++ exited with 143 +++
因此,timeout
试图变得聪明,并杀死了整个流程组。据我所知,
- bash为管道
setpgid(28996, 28996)
创建一个进程组 - timeout启动同一组
setpgid(0, 0)
中的进程 - 超时后CCD_ 20杀死整个进程组CCD_
- 因为所有管道进程都在同一个进程组中,所以所有进程都被终止
您可以使用命令分组{ ... }
使bash为左侧启动一个新的进程组。
您可以使用timeout --foreground
,但是timeout
将仅终止前台进程。因此,当bash
将死亡时,gzip
进程仍将等待后台运行的sleep 5
,因为它将打开stdin
猜测(同样来自提交消息(我认为这可能是timeout
可以杀死整个管道的意图,就像它是一个内置的魔法外壳一样。
此外,启用作业控制和禁用作业控制的行为不同,因此交互shell和非交互shell的行为也不同。