正则表达式中的代码块 while 循环正在工作,但在手动终止之前不显示结果



我是Perl和CS的新手,只是想写一些与生物信息学相关的代码来学习。 我正在尝试使用简单的匹配运算符和 while 循环遍历文本文件以查找特定序列 ($motif) 的所有出现,当我在代码本身中定义$motif时,程序工作正常,但是当我使用用户输入时,我的 while 循环中的代码确实有效,但不能正常工作,它也不会终止, 当我手动终止它时,有时它会显示一些预期的结果,有时会显示所有结果。

use strict;
use warnings;
use feature ':5.28';

print 'Enter the file name containing the sequence:';
my $filename = <>;
chomp $filename;
open(SEQ, '<', $filename)
or die "Could not open file '$filename' $!";
$/ = ''; #to read the whole file at once as it'll stop reading only when an undefined character comes up
my $row = <SEQ>; #storing the sequence from file to a variable
chomp $row;
print "nEnter the Motif sequence to be searched:";
my $motif = <>;
my $counter = 0;
chomp $motif;
while ($row =~ m|($motif)|g) {
$counter++;
print"n";
print "The motif's occurnce $counter ends at position ", pos$row, "n";
}

预期的输出是所有$motif出现的列表,但程序不会终止,当我使用ctrl+c手动终止它时,它会显示前 2-3 次出现,它不像我在代码本身中分配$motif值那样的时间,它会立即给出数百个匹配项。

如果我直接在代码中将文件序列(我正在搜索的)分配给变量$row那么 while 循环也可以正常运行,但是当我获取输入文件名($filename) 并将其写入$row和要从用户那里搜索的序列($motif)时,循环无法正常工作。在代码中分配任何一个,程序就可以正常工作。

您已将输入记录分隔符 ($/) 从之前的 值更改为 ''(空字符串)。
在此行:my $motif = <>;输入是预期的,并且不会像往常一样以"enter"()结尾。这是您的程序"卡住"的地方。它等到它得到一个 EOF 值(文件结束)。您可以使用 Ctrl+d(或窗口中的 Ctrl+z)传递 EOF 值,以便程序将继续。

chomp也使用它(输入记录分隔符),因此它也无法按预期工作(第一个 chomp 将正常工作,因为它在更改之前被调用)。
您应该返回其原始值(最好仅在本地更改它)。 您还可以将输入记录分隔符设置为空字符串。如果要在"slurp 模式"下读取文件,则应将其设置为undef

您可以在此处阅读更多内容:slurp模式 - 一步读取文件

对代码进行简单更新(确保删除$/ = '';行):

my $row = '';
{
open(my $fh, '<', $filename) or die "Could not open file '$filename' $!";
local $/ = undef;
$row = <$fh>;
close $fh;
}

虽然我不建议这样做...可能更好地将文件读取为一组行,并使用一些更现代的方式,如 Path::Tiny。

我对您的代码进行了一些小的更改,并使用MT_mouse.txt成功对其进行了测试。
法典:

#!/usr/bin/perl
use strict;
use warnings;
print 'Enter the file name containing the sequence: ';
my $filename = <>;
chomp $filename;
open(my $fh, '<', $filename) or die "Could not open file '$filename' $!";
my @file_lines = <$fh>;
close $fh;
print 'Enter the Motif sequence to be searched: ';
my $motif = <>;
chomp $motif;
print 'Read ' . scalar(@file_lines) . " lines at file: '$filename'nmotif: '$motif'n";
my ($line, $occurrences) = (0, 0);
foreach my $row (@file_lines) {
$line++;
next unless $row =~ /Q$motifE/;
my @motif_index = ();
my $position = 0;
while((my $index = index $row, $motif, $position) >= 0) {
push(@motif_index, $index);
$position = $index + length $motif;
}
print "Motif found on row#$linetat position(s): " . join(', ', @motif_index) . ".n";
$occurrences += scalar @motif_index;
}
print "nMotif '$motif' was " . ($occurrences ? "found $occurrences times" : 'not found') . " at file: '$filename'.n";
__END__

输出:

Enter the file name containing the sequence: MT_mouse.txt
Enter the Motif sequence to be searched: ACCCC
Read 272 lines at file: 'MT_mouse.txt'
motif: 'ACCCC'
Motif found on row#4    at position(s): 41.
Motif found on row#9    at position(s): 19.
Motif found on row#11   at position(s): 40.
Motif found on row#23   at position(s): 8.
Motif found on row#33   at position(s): 3.
Motif found on row#59   at position(s): 1.
Motif found on row#61   at position(s): 31.
Motif found on row#65   at position(s): 37.
Motif found on row#83   at position(s): 3.
Motif found on row#98   at position(s): 22.
Motif found on row#115  at position(s): 48.
Motif found on row#122  at position(s): 26.
Motif found on row#132  at position(s): 49.
Motif found on row#133  at position(s): 36.
Motif found on row#173  at position(s): 21.
Motif found on row#183  at position(s): 21.
Motif found on row#188  at position(s): 52.
Motif found on row#199  at position(s): 7.
Motif found on row#209  at position(s): 51.
Motif found on row#228  at position(s): 28.
Motif found on row#230  at position(s): 43.
Motif found on row#247  at position(s): 45.
Motif found on row#249  at position(s): 53.
Motif found on row#269  at position(s): 11, 18, 39.
Motif 'ACCCC' was found 26 times at file: 'MT_mouse.txt'.

最新更新