我有一个忘恩负义的任务来修复旧的antlr2解析器中的一个错误,该解析器用于解析edifact文件。不幸的是,我对antlr2或解析器不是很熟悉,我无法让它工作。
编辑文件如下所示:
ABC+Name+Surname+zip+city+street+country+1961219++0037141008'
XYZ+Company+++XYZ+zip+street'
LMN+20081010+1100'
有几个不同的段,以关键字开头。 例如 XYZ 或 ABC。关键字后跟不同的属性值,所有属性值都用">+"分隔。属性值可以为空。每个段都以".
问题是,每当数据属性包含关键字时,解析器都会抛出错误:
意外令牌:XYZ
XYZ+公司+++XYZ+zip+street'
这是语法文件的摘录:
// $ANTLR 2.7.6
xyz: "XYZ" ELT_SEP!
(xyz1_1a:ANUM|xyz1_1b:NUM) {lq(90,xyz1_1a,xyz1_1b,"XYZ1-1"+LQ90)}? ELT_SEP!
(xyz1_2a:ANUM|xyz1_2b:NUM)? {lq_(90,xyz1_2a,xyz1_2b,"XYZ1-2"+LQ90)}? ELT_SEP!
(xyz1_3a:ANUM|xyz1_3b:NUM)? {lq_(90,xyz1_3a,xyz1_3b,"XYZ1-3"+LQ90)}? ELT_SEP!
(xyz2a:ANUM|xyz2b:NUM)? {lq_(3,xyz2a,xyz2b,"XYZ2"+LQ3)}? ELT_SEP!
(xyz3a:ANUM|xyz3b:NUM)? {lq_(6,xyz3a,xyz3b,"XYZ3"+LQ6)}? ELT_SEP!
(xyz4a:ANUM|xyz4b:NUM) {lq(30,xyz4a,xyz4b,"XYZ4"+LQ30)}?
(ELT_SEP! (xyz5a:ANUM|xyz5b:NUM)?)? {lq_(46,xyz5a,xyz5b,"XYZ5"+LQ46)}? SEG_TERM!
{
if (skipNachricht()) return;
Xyz xyz = new Xyz();
xyz.xyz1_1 = getText(nn(xyz1_1a, xyz1_1b));
xyz.xyz1_2 = getText(nn(xyz1_2a, xyz1_2b));
xyz.xyz1_3 = getText(nn(xyz1_3a, xyz1_3b));
xyz.xyz2 = getText(nn(xyz2a, xyz2b));
xyz.xyz3 = getText(nn(xyz3a, xyz3b));
xyz.xyz4 = getText(nn(xyz4a, xyz4b));
xyz.xyz5 = getText(nn(xyz5a, xyz5b));
handleXyz(xyz);
}
;
/*
* Lexer
*/
class EdifactLexer extends Lexer;
options {
k=2;
filter=true;
charVocabulary = '3'..'377'; // Latin
}
DEZ_SEP: ','
{
//System.out.println("Found dez_sep: " + getText());
}
;
ELT_SEP: '+'
{
//System.out.println("Found elt_sep: " + getText());
}
;
SEG_TERM: '''
{
// System.out.println("Found seg_term: " + getText());
}
;
NUM: (('0'..'9')+ (',' ('0'..'9')+)? ('+' | '''))
=> ('0'..'9')+ (',' ('0'..'9')+)?
{
//System.out.println("num_: " + getText());
}
|
((ESCAPED | ~('?' | '+' | ''' | ',' | 'r' | 'n'))+ )
=> ( ESCAPED | ~('?' | '+' | ''' | ',' | 'r' | 'n'))+
{
$setType(ANUM);
//System.out.println("anum: " + getText());
}
|
(WRONGLY_ESCAPED) => WRONGLY_ESCAPED
{$setType(WRONGLY_ESCAPED); }
;
protected
WRONGLY_ESCAPED: '?' ~('?' | ':' | '+' | ''' | ',')
{
//System.out.println("Found wrong_escaped: " + getText());
}
;
protected
ESCAPED: '?'
( ',' {$setText(","); }
| '?' {$setText("?"); }
| ''' {$setText("'"); }
| ':' {$setText(":"); }
| '+' {$setText("+"); }
)
{
//System.out.println("Found escaped: " + getText());
}
;
NEWLINE : ( "rn" // DOS
| 'r' // MAC
| 'n' // Unix
)
{ newline();
$setType(Token.SKIP);
}
;
任何帮助都非常感谢:)。
这可能不是最好的解决方案,但我终于找到了解决问题的方法。 因此,如果有人对类似的问题感到困惑,这是我的解决方案:
我编写了一个方法,如果当前令牌类型与我的任何关键字匹配,则将其令牌类型更改为 ANUM:
void ckt() throws TokenStreamException, SemanticException {
if (mKeywordList.contains(LT(1).getType())) {
LT(1).setType(ANUM);
}
}
在尝试访问 ANUM-Token 之前,在我的解析器规则中调用该方法:
xyz: "XYZ" ELT_SEP!
{ckt();}(xyz1_1a:ANUM|xyz1_1b:NUM) {lq(90,xyz1_1a,xyz1_1b,"XYZ1-1"+LQ90)}? ELT_SEP!
{ckt();}(xyz1_2a:ANUM|xyz1_2b:NUM)? {lq_(90,xyz1_2a,xyz1_2b,"XYZ1-2"+LQ90)}? ELT_SEP!
{ckt();}(xyz1_3a:ANUM|xyz1_3b:NUM)? {lq_(90,xyz1_3a,xyz1_3b,"XYZ1-3"+LQ90)}? ELT_SEP!
{ckt();}(xyz2a:ANUM|xyz2b:NUM)? {lq_(3,xyz2a,xyz2b,"XYZ2"+LQ3)}? ELT_SEP!
{ckt();}(xyz3a:ANUM|xyz3b:NUM)? {lq_(6,xyz3a,xyz3b,"XYZ3"+LQ6)}? ELT_SEP!
{ckt();}(xyz4a:ANUM|xyz4b:NUM) {lq(30,xyz4a,xyz4b,"XYZ4"+LQ30)}?
(ELT_SEP! {ckt();}(xyz5a:ANUM|xyz5b:NUM)?)? {lq_(46,xyz5a,xyz5b,"XYZ5"+LQ46)}? SEG_TERM!
{
if (skipNachricht()) return;
Xyz xyz = new Xyz();
xyz.xyz1_1 = getText(nn(xyz1_1a, xyz1_1b));
xyz.xyz1_2 = getText(nn(xyz1_2a, xyz1_2b));
xyz.xyz1_3 = getText(nn(xyz1_3a, xyz1_3b));
xyz.xyz2 = getText(nn(xyz2a, xyz2b));
xyz.xyz3 = getText(nn(xyz3a, xyz3b));
xyz.xyz4 = getText(nn(xyz4a, xyz4b));
xyz.xyz5 = getText(nn(xyz5a, xyz5b));
handleXyz(xyz);
}
;