有时长时间运行的ssh命令会停止打印到标准输出



我一直在使用Perl::Net::SSH在我的远程机器上自动运行一些脚本。然而,其中一些脚本需要很长时间才能完成(一两个小时),有时,我停止从它们获取数据,而实际上并没有失去连接。

下面是我使用的代码:
sub run_regression_tests {
    for(my $i = 0; $i < @servers; $i++){
        my $inner = $users[$i];
        foreach(@$inner){
            my $user = $_;
            my $server = $servers[$i];
            my $outFile;
            open($outFile, ">" . $outputDir . $user . "@" . $server . ".log.txt");
            print $outFile "Opening connection to $user at $server on " . localtime() . "nn";
            close($outFile);
            my $pid = $pm->start and next;
                print "Connecting to $user@" . "$server...n";
                my $hasWentToDownloadYet = 0;
                my $ssh = Net::SSH::Perl->new($server, %sshParams);
                $ssh->login($user, $password);              
                $ssh->register_handler("stdout", sub {
                    my($channel, $buffer) = @_;             
                    my $outFile;
                    open($outFile, ">>", $outputDir . $user . "@" . $server . ".log.txt");                  
                    print $outFile $buffer->bytes;              
                    close($outFile);                
                    my @lines = split("n", $buffer->bytes);
                    foreach(@lines){
                        if($_ =~ m/REGRESSION TEST IS COMPLETE/){
                            $ssh->_disconnect();
                            if(!$hasWentToDownloadYet){
                                $hasWentToDownloadYet = 1;
                                print "Caught exit signal.n";
                                print("Regression tests for ${user}@${server} finised.n");
                                download_regression_results($user, $server);
                                $pm->finish;
                            }
                        }
                    }
                });
                $ssh->register_handler("stderr", sub {
                    my($channel, $buffer) = @_;             
                    my $outFile;
                    open($outFile, ">>", $outputDir . $user . "@" . $server . ".log.txt");
                    print $outFile $buffer->bytes;              
                    close($outFile);                
                });
                if($debug){
                    $ssh->cmd('tail -fn 40 /GDS/gds/gdstest/t-gds-master/bin/comp.reg');
                }else{
                    my ($stdout, $stderr, $exit) = $ssh->cmd('. ./.profile && cleanall && my.comp.reg');
                    if(!$exit){
                        print "SSH connection failed for ${user}@${server} finised.n";
                    }
                }
                #$ssh->cmd('. ./.profile');
                if(!$hasWentToDownloadYet){
                    $hasWentToDownloadYet = 1;
                    print("Regression tests for ${user}@${server} finised.n");
                    download_regression_results($user, $server);
                }
            $pm->finish;        
        }
    }
    sleep(1);
    print "nnnAll tests started. Tests typically take 1 hour to complete.n";
    print "If they take significantly less time, there could be an error.n";
    print "nnNo output will be printed until all commands have executed and finished.n";
    print "If you wish to watch the progress tail -f one of the logs this script produces.n Example:nt" . 'tail -f ./gds1@tdgds10.log.txt' . "n";
    $pm->wait_all_children;
    print "nnAll Tests are Finished. n";
}

这里是%sshParams:

my %sshParams = (
    protocol => '2',
    port => '22',
    options => [
        "TCPKeepAlive yes",
        "ConenctTimeout 10",
        "BatchMode yes"
    ]
);
有时候,某个长时间运行的命令只是随机地停止打印/触发stdout或stderr事件,并且永远不会退出。ssh连接不会死(据我所知),因为$ssh->cmd仍然阻塞。

知道如何纠正这种行为吗?

在%sshParams散列中,您可能需要将"TCPKeepAlive yes"添加到您的选项中:

$sshParams{'options'} = ["BatchMode yes", "TCPKeepAlive yes"];

这些选项可能适合您,也可能不适合您,但我建议为任何长时间运行的SSH连接设置TCPKeepAlive。如果您的路径中有任何类型的有状态防火墙,如果它长时间没有通过连接传递流量,它可能会丢弃状态。

失败可能是由于您查看REGRESSION TEST IS COMPLETE标记输出的方式。它可能被分散在两个不同的SSH数据包中,所以你的回调永远找不到它。

更好的方法是使用一个远程命令,该命令在完成后结束,就像这样一行:

perl -pe 'BEGIN {$p = open STDIN, "my.comp.reg |" or die $!}; kill TERM => -$p if /REGRESSION TEST IS COMPLETE/}'

否则,您将关闭远程连接,但不会停止将保持活动的远程进程。

除此之外,您应该尝试使用Net::OpenSSH或Net::OpenSSH::Parallel来代替Net::SSH::Perl:

use Net::OpenSSH::Parallel;
my $pssh = Net::OpenSSH::Parallel->new;
for my $i (0..$#server) {
    my $server = $server[$i];
    for my $user (@{$users[$ix]}) {
        $pssh->add_host("$user@$server", password => $password);
    }
}
if ($debug) {
    $pssh->all(cmd => { stdout_file => "$outputDir%USER%@%HOST%.log.txt",
                        stderr_to_stdout => 1 },
               'fail -fn 40 /GDS/gds/gdstest/t-gds-master/bin/comp.reg');
}
else {
    $pssh->all(cmd => { stdout_file => "$outputDir%USER%@%HOST%.log.txt",
                        stderr_to_stdout => 1 },
               '. ./.profile && cleanall && my.comp.reg');
}
$pssh->all(scp_get => $remote_regression_results_path, "regression_results/%USER%@%HOST%/");
$pssh->run;

相关内容

  • 没有找到相关文章

最新更新