用马尔帕语法制作0+长度列表的简明方法



我是Marpa的新手。我已经尝试了几种方法来描述语法中0个或多个术语的列表,并且我希望避免使用多个解析树。

我的语言将有1个组件,后面跟着0+个子组件:

package => component-rule [subcomponent-rule ...]

我首先尝试的是:

{ lhs => 'Package', rhs => [qw/component-rule subcomponents/] },
{ lhs => 'subcomponents', rhs => [qw/subcomponent-list/] },
{ lhs => 'subcomponent-list', rhs => [qw/subcomponent-rule/], action => 'do_subcomponent_list' },
{ lhs => 'subcomponent-list', rhs => [qw/subcomponent-list subcomponent-rule/], action => 'do_subcomponent_list' },
{ lhs => 'subcomponent-list', rhs => [qw//], action => 'do_subcomponent_empty_list' },
{ lhs => 'subcomponent-rule', rhs => [qw/subcomponent subcomponent-name/], action => 'do_subcomponent' },

(帖子末尾的完整代码。)

这是我的输入:

$recce->read( 'component', );
$recce->read( 'String', 'MO Factory');
$recce->read( 'subcomponent', );
$recce->read( 'String', 'Memory Wipe Station');
$recce->read( 'subcomponent', );
$recce->read( 'String', 'DMO Tour Robot');

我得到了两个解析树,第一个有一个不想要的undef,第二个是我喜欢的。两者都将列表视为一棵树。

$VAR1 = [
          {
            'Component' => 'MO Factory'
          },
          [
            [
              {
                'Subcomponent' => undef
              },
              {
                'Subcomponent' => 'Memory Wipe Station'
              }
            ],
            {
              'Subcomponent' => 'DMO Tour Robot'
            }
          ]
        ];
$VAR2 = [
          {
            'Component' => 'MO Factory'
          },
          [
            {
              'Subcomponent' => 'Memory Wipe Station'
            },
            {
              'Subcomponent' => 'DMO Tour Robot'
            }
          ]
        ];

子组件列表的可为null规则允许0个子组件的情况,但它在1+子组件列表前面引入了null元素,这是一种替代解析。(谢天谢地,玛帕只下了一次循环。)

我的另一个想法是使子组件列表不可为null,并引入一个中间规则,即0或1个子组件列表:

{ lhs => 'subcomponents', rhs => [qw//] },
{ lhs => 'subcomponents', rhs => [qw/subcomponent-list/] },

这至少消除了多重解析,但我仍然有一个循环和一个混乱的嵌套树需要压缩。

有没有更直接的方法来制作一个0+长度的列表或以其他方式使符号可选?

完整样本代码:

#!/usr/bin/perl
use Marpa::R2;
use Data::Dumper;
my $grammar = Marpa::R2::Grammar->new(
    {   start   => 'Package',
        actions => 'My_Actions',
        default_action => 'do_what_I_mean',
        rules => [
        { lhs => 'Package', rhs => [qw/component-rule subcomponents/] },
        { lhs => 'component-name', rhs => [qw/String/] },
        { lhs => 'component-rule', rhs => [qw/component component-name/], action => 'do_component' },
        { lhs => 'subcomponent-name', rhs => [qw/String/] },
        { lhs => 'subcomponent-rule', rhs => [qw/subcomponent subcomponent-name/], action => 'do_subcomponent' },
        { lhs => 'subcomponents', rhs => [qw//] },
        { lhs => 'subcomponents', rhs => [qw/subcomponent-list/] },
        { lhs => 'subcomponent-list', rhs => [qw/subcomponent-rule/], action => 'do_subcomponent_list' },
        { lhs => 'subcomponent-list', rhs => [qw/subcomponent-list subcomponent-rule/], action => 'do_subcomponent_list' },
#       { lhs => 'subcomponent-list', rhs => [qw//], action => 'do_subcomponent_empty_list' },
#       { lhs => 'subcomponent-list', rhs => [qw//],  },
        ],
    }
);
$grammar->precompute();
my $recce = Marpa::R2::Recognizer->new( { grammar => $grammar } );
$recce->read( 'component', );
$recce->read( 'String', 'MO Factory');
if (1) {
$recce->read( 'subcomponent', );
$recce->read( 'String', 'Memory Wipe Station');
$recce->read( 'subcomponent', );
$recce->read( 'String', 'DMO Tour Robot');
$recce->read( 'subcomponent', );
$recce->read( 'String', 'SMO Break Room');
}

my @values = ();
while ( defined( my $value_ref = $recce->value() ) ) {
    push @values, ${$value_ref};
}
print "result is ",Dumper(@values),"n";
sub My_Actions::do_what_I_mean {
    print STDERR "do_what_I_meann";
    # The first argument is the per-parse variable.
    # At this stage, just throw it away
    shift;
    # Throw away any undef's
    my @children = grep { defined } @_;
    # Return what's left
    return scalar @children > 1 ? @children : shift @children;
}
sub My_Actions::do_component {
    my ( undef, $t1 ) = @_;
    print STDERR "do_component $t1n";
    my $href = { 'Component' => $t1 };
    return $href;
}
sub My_Actions::do_subcomponent{
    my ( undef, $t1 ) = @_;
    print STDERR "do_subcomponent $t1n";
    my $href = { 'Subcomponent' => $t1 };
    return $href;
}
sub My_Actions::do_subcomponent_empty_list
{
    print STDERR "do_subcomponent_empty_listn";
    my $href = { 'Subcomponent' => undef };
    return $href;
}
sub My_Actions::do_subcomponent_list{
    # The first argument is the per-parse variable.
    # At this stage, just throw it away
    shift;
    # Throw away any undef's
    my @children = grep { defined } @_;
    print STDERR "do_subcomponent_list size ",scalar(@children),"n";
# Do this to collapse recursive trees to a list:
#    @children = map { ref $_ eq "ARRAY" ? @{$_} : $_; } @children;
    return scalar @children > 1 ? @children : shift @children;
}

使用min参数指定序列规则。该值可以是0(也就是正则表达式中的*量词)或1(也就是+量词)。可以通过删除subcomponentssubcomponent-list规则来执行此操作。改为添加:

{
  lhs => 'subcomponents',
  rhs => ['subcomponent-rule'],
  min => 0,
  action => 'do_subcomponent_list',
}

然后,您的语法在不做进一步修改的情况下运行。

使用序列规则更可取:不需要进行扁平化,语法应该更高效。

<小时>

请注意,我们鼓励您使用Scanless接口。DSL很好地概括了这个问题:

subcomponents ::= <subcomponent rule>* action => do_subcomponent_list

相关内容

  • 没有找到相关文章

最新更新