带有@init块的ANTLR4 lexer规则

我在ANTLR v3语法文件中定义了这个lexer规则——它是双引号中的数学文本。我需要将其转换为ANTLR v4。ANTLR编译器抛出一个错误"语法错误：不匹配的输入"@"，匹配lexer规则时应为COLON"（在@init行中）。lexer规则可以包含@init块吗？应该如何重写？

DOUBLE_QUOTED_CHARACTERS
@init 
{
   int doubleQuoteMark = input.mark(); 
   int semiColonPos = -1;
}
: ('"' WS* '"') => '"' WS* '"' { $channel = HIDDEN; }
{
    RecognitionException re = new RecognitionException("Illegal empty quotes""!", input);
    reportError(re);
}
| '"' (options {greedy=false;}: ~('"'))+ 
  ('"'|';' { semiColonPos = input.index(); } ('u0020'|'t')* ('n'|'r'))
{ 
    if (semiColonPos >= 0)
    {
        input.rewind(doubleQuoteMark);
        RecognitionException re = new RecognitionException("Missing closing double quote!", input);
        reportError(re);
        input.consume();            
    }
    else
    {
        setText(getText().substring(1, getText().length()-1));
    }
}
;

样本数据：

"->抛出错误"非法空引号！"
"asd->throws error"缺少右双引号！"
"text"->返回文本（有效输入，"…"的内容）

我认为这是正确的方法。

DOUBLE_QUOTED_CHARACTERS
:
{
   int doubleQuoteMark = input.mark();
   int semiColonPos = -1;
}
(
    ('"' WS* '"') => '"' WS* '"' { $channel = HIDDEN; }
    {
        RecognitionException re = new RecognitionException("Illegal empty quotes""!", input);
        reportError(re);
    }
    | '"' (options {greedy=false;}: ~('"'))+
      ('"'|';' { semiColonPos = input.index(); } ('u0020'|'t')* ('n'|'r'))
    {
        if (semiColonPos >= 0)
        {
            input.rewind(doubleQuoteMark);
            RecognitionException re = new RecognitionException("Missing closing double quote!", input);
            reportError(re);
            input.consume();
        }
        else
        {
            setText(getText().substring(1, getText().length()-1));
        }
    }
)
;

上面还有一些其他错误，比如WS…=>。。。但我不会把它们作为这个答案的一部分来纠正。只是为了保持简单。我从这里得到了提示

只是为了避免该链接在一段时间后移动或无效，按原样引用文本：

Lexer动作可以出现在4.2的任何地方，而不仅仅是在最外面的选项末尾。lexer根据动作在规则中的位置，在适当的输入位置执行动作。要为具有多个备选方案的角色执行单个操作，您可以将alts括在括号中，然后将该操作放在后面：

END : ('endif'|'end') {System.out.println("found an end");} ;
The action conforms to the syntax of the target language. ANTLR copies the action’s contents into the generated code verbatim; there is no translation of expressions like $x.y as there is in parser actions.
Only actions within the outermost token rule are executed. In other words, if STRING calls ESC_CHAR and ESC_CHAR has an action, that action is not executed when the lexer starts matching in STRING.

当我的.g4语法导入lexer文件时，我遇到了这个问题。导入语法文件似乎会引发ANTLR4中许多未记录的缺陷。所以最终我不得不停止使用import。在我的例子中，一旦我将LEXER语法合并到解析器语法（一个单独的.g4文件）中，我的@input和@after解析错误就消失了。我应该提交一个测试用例+bug，至少要把它记录下来。一旦我这样做，我会在这里更新。我模糊地记得有2-3个关于将lexer语法导入我的解析器的问题，这些问题触发了未记录的行为。stackoverflow上有很多内容。

相关内容

最新更新

热门标签：