perl将字典短语字符串与/进行拆分和组合(以便更好地访问短语条目)



我有两个字符串:be/feel like a new A/Bfeel like a new/old A/B。第一个字符串需要分成四个字符串,如下所示:be like a new Abe like a new Bfeel like a new Afeel like a new B和第二个也需要分成四个:feel like a new Afeel like a new Bfeel like a old Afeel like a old B

在perl中,我们能做这样特殊的拆分吗?提前谢谢。

更新:我尝试了以下操作,但由于$a$with_slash$without_slash的潜在顺序不确定,到目前为止我想不出方法。

my $a = "be/feel like a new A/B/C";
my $with_slash = qr!S+/S+!;
my $without_slash = qr![^n/]!;
my @list = $a =~ m!((($with_slash)|($without_slash))+)!g;
my $pat = "be/feel like a new A/B";
my $glob = "";
my @parts = split(qr{( w++ (?: / w++ )++ )}x, $pat, -1);
for my $i (0..$#parts) {
if ($i % 2 == 0) {
$glob .= quotemeta($parts[$i]);
} else {
$glob .= "{".( $parts[$i] =~ s{/}{,}rg )."}";
}
}
my @list = glob($glob);

my $pat = "be/feel like a new A/B";
my $glob = "";
for ($pat) {
m{G W++ }xgc
and do { $glob .= quotemeta($&); redo; };
m{G w++ (?: / w++ )++ }xgc
and do { $glob .= "{".( $& =~ s{/}{,}rg )."}"; redo; };
m{G w++ }xgc
and do { $glob .= $&; redo; };
m{G z }xgc
or die("Bad datan");
}
my @list = glob($glob);

这些从be/feel like a new A/B产生{be,feel} like a new {A,B}glob可以将其扩展到所需的列表中。

有多种方法可以做到这一点。其中一种方法是将空白处的字符串拆分为标记。令牌被连接回字符串,存储在输出数组中(每个元素都是一个不断增长的字符串(。每次我们遇到带有一个或多个正斜杠的令牌时,我们都会将斜杠拆分为多个变量,并使用2个嵌套的map操作(这里的操作类似于两个循环(将每个变量添加到输出数组中。

echo "be/feel like a new A/Bnfeel like a new/old A/B" | 
perl -e '
use warnings;
use strict;
while ( <> ) {
chomp;
my @tokens = split / /, $_;
# Array with 1 element = empty string, to enable joining words:
my @outputs = ( q{} ); 
foreach my $token ( @tokens ) {
my @variants = split m{/}, $token; 
@outputs = map { my $old_str = $_; map { "$old_str $_" } @variants } @outputs
}
# Remove leading blanks that resulted from the first empty string:
@outputs = map { my $str = $_; $str =~ s/^s+//; $str } @outputs;
print "outputs for: $_:n";
print "$_n" for @outputs;
}
'

此打印:

outputs for: be/feel like a new A/B:
be like a new A
be like a new B
feel like a new A
feel like a new B
outputs for: feel like a new/old A/B:
feel like a new A
feel like a new B
feel like a old A
feel like a old B

这可以处理任何长度、任何数量的标记的字符串,其中任何数量的字符串都有任何数量的斜杠(前提是数组适合RAM(。每行处理一个字符串。

最新更新