我有一个相当大的Marpa语法(用于解析XPath),并且在标记化方面遇到了问题。我在下面创建了一个最小的破坏性示例:
use strict;
use warnings;
use Marpa::R2;
my $grammar = Marpa::R2::Scanless::G->new(
{
source => (<<'END_OF_SOURCE'),
:default ::= action => ::array
:start ::= Start
Start ::= Child DoubleColon Token
DoubleColon ~ '::'
Child ~ 'child'
Token ~
word
| word ':' word
word ~ [w]+
END_OF_SOURCE
}
);
my $reader = Marpa::R2::Scanless::R->new(
{
grammar => $grammar,
trace_terminals => 1,
}
);
my $input = 'child::book';
$reader->read($input);
脚本输出如下内容:
Registering character U+0063 as symbol 10: [[w]]
Registering character U+0063 as symbol 3: [[c]]
Registering character U+0068 as symbol 10: [[w]]
Registering character U+0068 as symbol 4: [[h]]
Registering character U+0069 as symbol 10: [[w]]
Registering character U+0069 as symbol 5: [[i]]
Registering character U+006c as symbol 10: [[w]]
Registering character U+006c as symbol 6: [[l]]
Registering character U+0064 as symbol 10: [[w]]
Registering character U+0064 as symbol 7: [[d]]
Registering character U+003a as symbol 1: [[:]]
Rejected lexeme @0-5: Token; value="child"
Accepted lexeme @0-5: Child; value="child"
Registering character U+0062 as symbol 10: [[w]]
Error in SLIF G1 read: No lexeme found at position 6
* String before error: child::
* The error was at line 1, column 8, and at character 0x0062 'b', ...
* here: book
我想把输入标记为[Child] [DoubleColon] [word]
。如终端跟踪所示,只读取和处理一个冒号字符。它似乎试图将字符串的开头标记为[word] [':'] [word]
,但中途失败了。如果删除语法的第10行(| word ':' word
),将不再抛出错误。
我尝试为DoubleColon (:lexeme ~ <DoubleColon> priority > 1
)创建一个优先级,但这不起作用。有人能告诉我怎么做才能使这个语法正确解析输入字符串吗?它仍然需要能够解析child::ns:book
等
这似乎是当前版本Marpa::R2 2.058中的一个错误。我向你道歉,并感谢你对这个问题的仔细描述。
我有一个修复,它通过了测试套件,我将很快发布一个新的版本。