为什么我的ANTLR4语法不能解析这个文本?



我希望能够使用ANTLR4解析以下文本:

six-buffers() {
evil-window-split();
evil-window-vsplit();
evil-window-vsplit();
evil-window-down(1);
evil-window-vsplit();
evil-window-vsplit();
};
six-buffers();

首先定义一个函数,然后调用它。

为此,我定义了以下语法:
grammar Deplorable;
script: statement*;
statement: (methodCall | functionDeclaration) ';' (WHITESPACE|NEW_LINE);
// General stuff
deplorableString: '"' DEPLORABLE_STRING* '"';
deplorableInteger: DEPLORABLE_NUMBER;
// Method call definition
methodCall: methodName LPAREN (methodArgument COMMA?)* RPAREN;
methodName: DEPLORABLE_IDENTIFIER;
methodArgument: (deplorableString | deplorableInteger);
// Function Declaration
functionStatement: methodCall ';' (WHITESPACE|NEW_LINE);
functionDeclaration: methodName LPAREN RPAREN functionBody;
functionBody: CURLY_BRACE_LEFT functionStatement* CURLY_BRACE_RIGHT;
// Lexer stuff
LPAREN: '(';
RPAREN: ')';
DEPLORABLE_IDENTIFIER: (LOWERCASE_LATIN_LETTER | UPPERCASE_LATIN_LETTER | UNDERSCORE | DASH)+;
DEPLORABLE_STRING: (LOWERCASE_LATIN_LETTER | UPPERCASE_LATIN_LETTER | UNDERSCORE | WHITESPACE | EXCLAMATION_POINT)+;
CURLY_BRACE_LEFT: '{';
CURLY_BRACE_RIGHT: '}';
NEW_LINE: ('rn'|'n'|'r');
DEPLORABLE_NUMBER: DIGIT+;
fragment COMMA: ',';
fragment DASH: '-';
fragment LOWERCASE_LATIN_LETTER: 'a'..'z';
fragment UPPERCASE_LATIN_LETTER: 'A'..'Z';
fragment UNDERSCORE: '_';
fragment WHITESPACE: ' ';
fragment EXCLAMATION_POINT: '!';
fragment DIGIT: '0'..'9';

我使用mvn clean antlr4:antlr4 install(禁用测试)编译此语法。这是我的pom.xml文件。

然而,当我试图在测试中解析上述文本时,我得到了错误

line 1:13 no viable alternative at input 'six-buffers() '

我试图在函数声明前添加void,以便解析器可以区分函数声明和函数调用,但这没有帮助。

我如何修复这个错误,即确保解析器正确识别函数声明,而不是将其误认为函数调用?

更新1:这个版本的语法(受Mike Cargal的启发)现在似乎可以工作了:

grammar Deplorable;
script: statement*;
statement: (methodCall | functionDeclaration) ';';
// General stuff
// Method call definition
methodCall: methodName LPAREN (methodArgument COMMA?)* RPAREN;
methodName: DEPLORABLE_IDENTIFIER;
methodArgument: (DEPLORABLE_STRING | DEPLORABLE_NUMBER);
// Function Declaration
functionStatement: methodCall ';';
functionDeclaration: methodName LPAREN RPAREN functionBody;
functionBody: CURLY_BRACE_LEFT functionStatement* CURLY_BRACE_RIGHT;
// Lexer stuff
LPAREN: '(';
RPAREN: ')';
DEPLORABLE_IDENTIFIER: (
LOWERCASE_LATIN_LETTER
| UPPERCASE_LATIN_LETTER
| UNDERSCORE
| DASH
)+;
DEPLORABLE_STRING: '"' (
LOWERCASE_LATIN_LETTER
| UPPERCASE_LATIN_LETTER
| UNDERSCORE
| WHITESPACE
| EXCLAMATION_POINT
)+ '"';
CURLY_BRACE_LEFT: '{';
CURLY_BRACE_RIGHT: '}';
NEW_LINE: (
'r' 'n'?
| 'n'
) -> skip;
DEPLORABLE_NUMBER: DIGIT+;
fragment COMMA: ',';
fragment DASH: '-';
fragment LOWERCASE_LATIN_LETTER: 'a'..'z';
fragment UPPERCASE_LATIN_LETTER: 'A'..'Z';
fragment UNDERSCORE: '_';
WHITESPACE: [ t]+ -> skip;
fragment EXCLAMATION_POINT: '!';
fragment DIGIT: '0'..'9';

@sepp2k为你指明了正确的方向。

您的Lexer规则(特别是DEPLORABLE_STRING)正在导致您的痛苦。更具体地说,这看起来像是许多人在使用ANTLR的早期所拥有的误解,即解析器规则可以与标记化有关。

在ANTLR管道中,首先使用Lexer规则将您的输入标记为标记流。因此,转储您的令牌流通常是非常有用的。

在你的例子中,流看起来像这样:
[@0,0:10='six-buffers',<DEPLORABLE_IDENTIFIER>,1:0]
[@1,11:11='(',<'('>,1:11]
[@2,12:12=')',<')'>,1:12]
[@3,13:13=' ',<DEPLORABLE_STRING>,1:13]
[@4,14:14='{',<'{'>,1:14]
[@5,15:15='n',<NEW_LINE>,1:15]
[@6,16:23='    evil',<DEPLORABLE_STRING>,2:0]
[@7,24:36='-window-split',<DEPLORABLE_IDENTIFIER>,2:8]
[@8,37:37='(',<'('>,2:21]
[@9,38:38=')',<')'>,2:22]
[@10,39:39=';',<';'>,2:23]
[@11,40:40='n',<NEW_LINE>,2:24]
[@12,41:48='    evil',<DEPLORABLE_STRING>,3:0]
[@13,49:62='-window-vsplit',<DEPLORABLE_IDENTIFIER>,3:8]
[@14,63:63='(',<'('>,3:22]
[@15,64:64=')',<')'>,3:23]
[@16,65:65=';',<';'>,3:24]
[@17,66:66='n',<NEW_LINE>,3:25]
[@18,67:74='    evil',<DEPLORABLE_STRING>,4:0]
[@19,75:88='-window-vsplit',<DEPLORABLE_IDENTIFIER>,4:8]
[@20,89:89='(',<'('>,4:22]
[@21,90:90=')',<')'>,4:23]
[@22,91:91=';',<';'>,4:24]
[@23,92:92='n',<NEW_LINE>,4:25]
[@24,93:100='    evil',<DEPLORABLE_STRING>,5:0]
[@25,101:112='-window-down',<DEPLORABLE_IDENTIFIER>,5:8]
[@26,113:113='(',<'('>,5:20]
[@27,114:114='1',<DEPLORABLE_NUMBER>,5:21]
[@28,115:115=')',<')'>,5:22]
[@29,116:116=';',<';'>,5:23]
[@30,117:117='n',<NEW_LINE>,5:24]
[@31,118:125='    evil',<DEPLORABLE_STRING>,6:0]
[@32,126:139='-window-vsplit',<DEPLORABLE_IDENTIFIER>,6:8]
[@33,140:140='(',<'('>,6:22]
[@34,141:141=')',<')'>,6:23]
[@35,142:142=';',<';'>,6:24]
[@36,143:143='n',<NEW_LINE>,6:25]
[@37,144:151='    evil',<DEPLORABLE_STRING>,7:0]
[@38,152:165='-window-vsplit',<DEPLORABLE_IDENTIFIER>,7:8]
[@39,166:166='(',<'('>,7:22]
[@40,167:167=')',<')'>,7:23]
[@41,168:168=';',<';'>,7:24]
[@42,169:169='n',<NEW_LINE>,7:25]
[@43,170:170='}',<'}'>,8:0]
[@44,171:171=';',<';'>,8:1]
[@45,172:172='n',<NEW_LINE>,8:2]
[@46,173:183='six-buffers',<DEPLORABLE_IDENTIFIER>,9:0]
[@47,184:184='(',<'('>,9:11]
[@48,185:185=')',<')'>,9:12]
[@49,186:186=';',<';'>,9:13]
[@50,187:186='<EOF>',<EOF>,9:14]

您会注意到@3,13单个' '被标记为一个DEPLORABLE_STRING。

您需要将引号合并到您的DEPLORABLE_STRING规则中。

(也建议您跳过WHITESPACE(可能还有NEW_LINE(大多数语法将NEW_LINEs视为WHITESPACE))

像这样的东西应该会让你"摆脱困境">

grammar Deplorable;
script: statement*;
statement: (methodCall | functionDeclaration) ';' (
WHITESPACE
| NEW_LINE
);
// General stuff deplorableString: '"' DEPLORABLE_STRING* '"'; deplorableInteger: DEPLORABLE_NUMBER;
// Method call definition
methodCall: methodName LPAREN (methodArgument COMMA?)* RPAREN;
methodName: DEPLORABLE_IDENTIFIER;
methodArgument: (DEPLORABLE_STRING | DEPLORABLE_NUMBER);
// Function Declaration
functionStatement: methodCall ';' (WHITESPACE | NEW_LINE);
functionDeclaration: methodName LPAREN RPAREN functionBody;
functionBody:
CURLY_BRACE_LEFT functionStatement* CURLY_BRACE_RIGHT;
// Lexer stuff
LPAREN: '(';
RPAREN: ')';
DEPLORABLE_IDENTIFIER: (
LOWERCASE_LATIN_LETTER
| UPPERCASE_LATIN_LETTER
| UNDERSCORE
| DASH
)+;
DEPLORABLE_STRING:
'"' (
LOWERCASE_LATIN_LETTER
| UPPERCASE_LATIN_LETTER
| UNDERSCORE
| WHITESPACE
| EXCLAMATION_POINT
)+ '"';
CURLY_BRACE_LEFT: '{';
CURLY_BRACE_RIGHT: '}';
NEW_LINE: ('rn' | 'n' | 'r');
DEPLORABLE_NUMBER: DIGIT+;
fragment COMMA: ',';
fragment DASH: '-';
fragment LOWERCASE_LATIN_LETTER: 'a' ..'z';
fragment UPPERCASE_LATIN_LETTER: 'A' ..'Z';
fragment UNDERSCORE: '_';
fragment WHITESPACE: ' ' -> skip;
fragment EXCLAMATION_POINT: '!';
fragment DIGIT: '0' ..'9';

仍然在一个无关的n上绊倒(因此我的评论是:WS和NL处理)。不确定你的意图,但看看其他语法是如何处理它的。skip它们通常比在解析器规则中解释它们可能出现的任何地方要容易得多。

最重要的是……让您的思维模型正确理解ANTLR过程是如何将字符流处理成标记流(使用Lexer规则),然后使用解析器规则处理标记流的。在你明白之前,你会经历很多痛苦。

相关内容

  • 没有找到相关文章

最新更新