为什么子进程中的管道有时会中断,有时不会中断?

  • 本文关键字:中断 管道 子进程 perl ipc
  • 更新时间 :
  • 英文 :


我正在用Perl开发一个系统,它有一个主脚本,需要执行几个不同的worker脚本,这些脚本同时在系统上执行不同的任务.
这些脚本也不需要用Perl编写。它们可以是可在命令行上执行的任何程序,并通过STDOUTSTDERR以及EXITCODE返回结果。

所以我做了一个小示例脚本hello.pl

#!/usr/bin/perl
print STDERR "HELLO ERROR 0 !n";
print "hello script!!n";
print "hello waiting 3 sec ...n";
sleep 3;
print "hello list:n";
print "cmd: 'ls -lah'n";
print `ls -lah`;
print "res: '$?'n";    
print "hello waiting 2 sec ...n";
sleep 2;    
print STDERR "HELLO ERROR 1 !n";

主脚本需要一方面捕获STDOUT的结果,另一方面捕获STDERR中的错误通知。

主脚本执行fork以非阻塞模式执行工作线程脚本,并使用waitpidselect来监视工作线程脚本进度。

所以master.pl脚本的代码是这样的:

#!/usr/bin/perl
use strict;
use warnings;
use IO::Select;
use POSIX ":sys_wait_h";

sub Launch
{
my $rprocess = shift;
my $irs = 0;

local *logreader;
local *errorreader;
my $logwriter   = undef;
my $errorwriter = undef;
pipe( *logreader,   $logwriter );
pipe( *errorreader, $errorwriter );
#Spawn the Child Process
my $iprcpid = fork();
#Check the Success of Process Forking
if(defined $iprcpid)
{           
#------------------------
#Sub Process Launch succeeded
# Check whether parent/child process
if($iprcpid > 0)
{
#------------------------
#Parent Process
close($logwriter);
close($errorwriter);
$rprocess->{"pid"} = $iprcpid;
$rprocess->{"log_pipe"} = *logreader;
$rprocess->{"error_pipe"} = *errorreader;
$irs = 1;
}
elsif($iprcpid == 0)
{
#------------------------
#Child Process
my $ierr = 0;
close(*logreader);
close(*errorreader);
open( STDOUT, ">&=", $logwriter );
open( STDERR, ">&=", $errorwriter );

#------------------------
#Execute the configured Command
print "cmd: '" . $rprocess->{"file"} . "'n";
print "cmd rng ...n";
print `$rprocess->{"file"}`; 
$ierr = $?;
print "closing transmission ...n";   
close STDOUT;
close STDERR;
exit $ierr; 
}
else    #An Error has ocurred in the Sub Process Launch
{
# Unable to fork
print "ERROR: Sub Process '" . $rprocess->{"name"} . "' Launch failed: $!n";
}   #if($iprcpid > 0)
}
else    #An Error has ocurred in the Process Spawning   
{
# Unable to fork
print "ERROR: Sub Process '" . $rprocess->{"name"} . "' Launch failed: $!n";
}  #if(defined $iprcpid)
return $irs;
}
sub Read
{
my $rprocess = shift;
my $ppsel = $rprocess->{"select"};
my $pplog = $rprocess->{"log_pipe"};
my $pperr = $rprocess->{"error_pipe"};
my @arrppselrdy = undef;
my $pprdy = undef;
my $srdln = "";
my $irdcnt = 0;
unless(defined $ppsel)
{
$ppsel = IO::Select->new();
$ppsel->add($pplog);
$ppsel->add($pperr);
#Store the Selector Object
$rprocess->{"select"} = $ppsel;
}
else  #The Selector was already created
{
$ppsel = $rprocess->{"select"};
}  #unless(defined $ppsel)    
while(@arrppselrdy = $ppsel->can_read(1))
{
foreach $pprdy (@arrppselrdy)
{
$irdcnt = sysread($pprdy, $srdln, 8192);
if(defined $irdcnt)
{
if($irdcnt > 0)
{
if(fileno($pprdy) == fileno($pplog))
{
$rprocess->{"log"} .= "pipe (" . fileno($pprdy) . "): reading report ...n";
$rprocess->{"log"} .= $srdln;
}
elsif(fileno($pprdy) == fileno($pperr))
{
$rprocess->{"log"} .= "pipe (" . fileno($pprdy) . "): reading error ...n";
$rprocess->{"error"} .= $srdln;
}   #if(fileno($pprdy) == fileno($pplog))
}
else    #End of Transmission
{
$rprocess->{"log"} .= "pipe (" . fileno($pprdy) . "): transmission done.n";
#Remove the Pipe File Handle
$ppsel->remove($pprdy);  
} #if($irdcnt > 0)
}
else  #Reading from the Pipe failed
{
#Remove the Pipe File Handle
$ppsel->remove($pprdy);
if($!)
{
$rprocess->{"error"} .= "ERROR: Sub Process " . $rprocess->{"name"} 
. ": Read failed with [" . ($! * 1) . "]!n"
. "Message: '$!'n";
}  #if($!)
}  #if(defined $irdcnt)
}  #foreach $pprdy (@arrppselrdy)
} #while(@arrppselrdy = $ppsel->can_read(1))
$rprocess->{"log"} .= "try read done. '" . $ppsel->count . "' pipes left.n";
}
sub Check
{
my $rprocess = shift;
my $irng = 0;
if($rprocess->{"pid"} > 0
&& $rprocess->{"status"} < 0)
{
my $ifinishpid = waitpid($rprocess->{"pid"}, WNOHANG);
if($ifinishpid > -1)
{
if($ifinishpid == 0)
{
$irng = 1;
}
else
{
$rprocess->{"status"} = $?;
}
#Read the Message Pipes
Read($rprocess);
if($ifinishpid > 0)
{
close $rprocess->{"log_pipe"};
close $rprocess->{"error_pipe"};
}
}
else  #Sub Process does not exist
{
$rprocess->{"error"} .= "ERROR: Sub Process does not exist!n";
}  #if($ifinishpid > 0)
}  #if($rprocess->{"pid"} > 0 && $rprocess->{"status"} < 0)
return $irng;
}

my %hshprocesses = ("hello1" => {"name" => "hello1", "pid" => -1, "file" => "./hello.pl"
, "log_pipe" => undef, "error_pipe" => undef, "select" => undef
, "log" => "", "error" => "", "status" => -1}
, "hello2" => {"name" => "hello2", "pid" => -1, "file" => "sleep 3 ; ./hello2.pl"
, "log_pipe" => undef, "error_pipe" => undef, "select" => undef
, "log" => "", "error" => "", "status" => -1}
, "hello3" => {"name" => "hello3", "pid" => -1, "file" => "./hello3.pl ; sleep 2"
, "log_pipe" => undef, "error_pipe" => undef, "select" => undef
, "log" => "", "error" => "", "status" => -1}
, "hello4" => {"name" => "hello4", "pid" => -1, "file" => "./hello4.pl"
, "log_pipe" => undef, "error_pipe" => undef, "select" => undef
, "log" => "", "error" => "", "status" => -1});

print "prcs launching ...n";
foreach (keys %hshprocesses)
{
unless(Launch($hshprocesses{$_}))
{
print "prc '" . $hshprocesses{$_}{"name"} . "': Launch failed!n";
}
}  #foreach (keys %hshprocesses)
print "prcs launched.n";
my $irunningcount = 0;
do
{
$irunningcount = 0;
foreach (keys %hshprocesses)
{
$irunningcount++ if(Check($hshprocesses{$_}));
}  #foreach (keys %hshprocesses)
print "prc rng cnt: '$irunningcount'n";
}
while($irunningcount > 0);
foreach (keys %hshprocesses)
{
print "process (" . $hshprocesses{$_}{"pid"} . ") '" . $hshprocesses{$_}{"name"} . "':n";
print "status [" . $hshprocesses{$_}{"status"} . "]n";
print "log:n"
. "'" . $hshprocesses{$_}{"log"} . "'n";
print "error:n"
. "'" . $hshprocesses{$_}{"error"} . "'n";
}  #foreach (keys %hshprocesses)

因此,阅读logreadererrorreader,我认为我将能够捕获整个输出.

预期的报告将是这样的:

$ ./master.pl
prcs launching ...
prcs launched.
prc rng cnt: '4'
prc rng cnt: '3'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '0'
process (2742) 'hello4':
status [0]
log:
'pipe (3): reading report ...
cmd: './hello4.pl'
cmd rng ...
pipe (5): reading error ...
try read done. '2' pipes left.
pipe (5): reading error ...
pipe (3): reading report ...
hello4 script!!
hello waiting 3 sec ...
hello4 list:
cmd: 'ls -lah'
total 24K
drwxr-xr-x  2 bodo bodo   90 may  5 08:23 .
drwxr-xr-x 11 bodo bodo  128 may  4 18:49 ..
-rwxr-xr-x  1 bodo bodo  307 may  4 22:33 hello2.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:37 hello3.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:34 hello4.pl
-rwxr-xr-x  1 bodo bodo  303 may  4 18:50 hello.pl
-rwxr-xr-x  1 bodo bodo 5,7K may  5 08:23 master.pl
res: '0'
hello waiting 2 sec ...
pipe (3): reading report ...
closing transmission ...
pipe (5): transmission done.
pipe (3): transmission done.
try read done. '0' pipes left.
try read done. '0' pipes left.
'
error:
'HELLO4 ERROR 0 !
HELLO4 ERROR 1 !
'
process (2743) 'hello1':
status [0]
log:
'pipe (4): reading report ...
cmd: 'sleep 3 ; ./hello.pl'
cmd rng ...
try read done. '2' pipes left.
pipe (7): reading error ...
try read done. '2' pipes left.
try read done. '2' pipes left.
pipe (7): reading error ...
pipe (4): reading report ...
hello script!!
hello waiting 3 sec ...
hello list:
cmd: 'ls -lah'
total 24K
drwxr-xr-x  2 bodo bodo   90 may  5 08:23 .
drwxr-xr-x 11 bodo bodo  128 may  4 18:49 ..
-rwxr-xr-x  1 bodo bodo  307 may  4 22:33 hello2.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:37 hello3.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:34 hello4.pl
-rwxr-xr-x  1 bodo bodo  303 may  4 18:50 hello.pl
-rwxr-xr-x  1 bodo bodo 5,7K may  5 08:23 master.pl
res: '0'
hello waiting 2 sec ...
closing transmission ...
pipe (7): transmission done.
pipe (4): transmission done.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
'
error:
'HELLO ERROR 0 !
HELLO ERROR 1 !
'
process (2745) 'hello3':
status [0]
log:
'pipe (6): reading report ...
cmd: './hello3.pl ; sleep 2'
cmd rng ...
pipe (9): reading error ...
try read done. '2' pipes left.
pipe (9): reading error ...
try read done. '2' pipes left.
pipe (6): reading report ...
hello3 script!!
hello waiting 3 sec ...
hello3 list:
cmd: 'ls -lah'
total 24K
drwxr-xr-x  2 bodo bodo   90 may  5 08:23 .
drwxr-xr-x 11 bodo bodo  128 may  4 18:49 ..
-rwxr-xr-x  1 bodo bodo  307 may  4 22:33 hello2.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:37 hello3.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:34 hello4.pl
-rwxr-xr-x  1 bodo bodo  303 may  4 18:50 hello.pl
-rwxr-xr-x  1 bodo bodo 5,7K may  5 08:23 master.pl
res: '0'
hello waiting 2 sec ...
closing transmission ...
pipe (9): transmission done.
pipe (6): transmission done.
try read done. '0' pipes left.
'
error:
'HELLO3 ERROR 0 !
HELLO3 ERROR 1 !
'
process (2746) 'hello2':
status [0]
log:
'pipe (8): reading report ...
cmd: './hello2.pl'
cmd rng ...
pipe (11): reading error ...
try read done. '2' pipes left.
pipe (8): reading report ...
hello2 script!!
hello waiting 3 sec ...
hello2 list:
cmd: 'ls -lah'
total 24K
drwxr-xr-x  2 bodo bodo   90 may  5 08:23 .
drwxr-xr-x 11 bodo bodo  128 may  4 18:49 ..
-rwxr-xr-x  1 bodo bodo  307 may  4 22:33 hello2.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:37 hello3.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:34 hello4.pl
-rwxr-xr-x  1 bodo bodo  303 may  4 18:50 hello.pl
-rwxr-xr-x  1 bodo bodo 5,7K may  5 08:23 master.pl
res: '0'
hello waiting 2 sec ...
closing transmission ...
pipe (11): reading error ...
pipe (8): transmission done.
pipe (11): transmission done.
try read done. '0' pipes left.
'
error:
'HELLO2 ERROR 0 !
HELLO2 ERROR 1 !
'

但这仅在只有 1 个工作线程脚本时才有效.
如果主脚本尝试读取多个工作线程脚本,则在执行工作线程脚本hello.pl之前,子级会关闭此传输管道。 最后读

到的是:
cmd: './hello.pl'
cmd rng ...

主脚本使用IO::Select模块读取logreader,并使用选择器 Object.
进行errorreader这样我在进度日志中找到:

cmd: './hello.pl'
cmd rng ...
pipe (8): transmission done.

但是hello.pl的所有东西都丢失了。

有时我仍然在错误报告中找到hello.plSTDERR的第一行

HELLO ERROR 0 !

为什么管道有时会破裂,有时不会?
我怎样才能防止这种情况发生?

现在有了这个测试脚本,我无法重现损坏管道的错误.
但即使在现实世界的应用程序中,它有时也会发生,有时不会.
我在带有perl v5.10的系统上开发了这个应用程序,并且从4个进程3运行和1中断。特别是如果它执行sleep调用。

管道在什么情况下会破裂?


编辑 2018-05-05:我构建了主脚本master.pl并使用 perl v5.24.1 运行它

由于没有人能在 1 个月内给我任何有用的答案甚至提示,我将发布我发现的有关断管现象的信息,以帮助可能面临类似问题的其他人。


最后,我可以追踪断现象.
我添加了足够长的睡眠时间,让我可以使用strace命令跟踪系统活动,它实际上记录了断管错误:

$ strace -p 11729
Process 11729 attached
restart_syscall(<... resuming interrupted call ...>) = 0
write(1, "hello script!!nhello waiting 20 "..., 67) = 67
pipe([3, 4])                            = 0
pipe([5, 6])                            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe670d5f9d0) = 13238
close(6)                                = 0
close(4)                                = 0
read(5, "", 4)                          = 0
close(5)                                = 0
ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fff334b57b0) = -1 EINVAL (Invalid argument)
lseek(3, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
read(3, "total 228Kndrwxr-xr-x.  4 usr15 "..., 4096) = 1868
read(3, "", 4096)                       = 0
close(3)                                = 0
rt_sigaction(SIGHUP, {SIG_IGN, [], SA_RESTORER, 0x7fe66f8e27e0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER, 0x7fe66f8e27e0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN, [], SA_RESTORER, 0x7fe66f8e27e0}, {SIG_DFL, [], 0}, 8) = 0
wait4(13238, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 13238
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=13238, si_status=0, si_utime=0, si_stime=0} ---
rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER, 0x7fe66f8e27e0}, NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7fe66f8e27e0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x7fe66f8e27e0}, NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({30, 0}, 0x7fff334b5b90)      = 0
write(2, "HELLO ERROR 1 !n", 16)       = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=11729, si_uid=501} ---
+++ killed by SIGPIPE +++

它记录了假定的hello.pl正确完成并结束写入错误消息write(2, "HELLO ERROR 1 !n", 16) = -1
但它发现管道已损坏,导致错误EPIPE (Broken pipe)导致其崩溃

并显示+++ killed by SIGPIPE +++

那么问题的答案
什么情况下管道会破裂?

  • 如果分叉的子进程已终止

但是为什么我无法在管道损坏之前读取输出?

切割输出的现象

cmd: './hello.pl'
cmd rng ...
pipe (8): transmission done.
  • 分叉子进程中print()组合的结果 和父进程中的sysread().
    输出为 实际上在子端缓冲,最后写入父端 边。

为什么子进程应在接收输出之前终止?

  • 这来自实际中与对象性的结合 实现Object::DESTROY的真实世界应用程序 Method.
    此方法包括一个kill()清理逻辑 运行理论上不应触发的子进程.
    但是在实际应用程序中,进程不是立即触发的 阅读并收获导致他们被Object::DESTROYLogic.
    但是我在执行日志中找不到它,因为它的对象已经被Perl的垃圾收集器破坏了。

因此,一些进程被杀死,而其他进程则没有。有时他们会被杀,但有时不会。

我认为你必须使用wait而不是waitpid.wait等待任何子项,而waitpid只等待一个特定的子项。

最新更新