Flex/Bison - 我的正则表达式不匹配两个或多个 X 的出现,例如 XXY-1 或 XXY-1



我正在使用Flex和Bison为虚构的编程语言创建解析器。将会有有效的变量名称。

XXXX XY-1 // valid
XXXXX Z // valid
XXX Y // valid
XXX 5Aet // invalid
XXXX XXAB-Y // invalid

开始时X只是指定变量的大小。变量5Aet无效的,因为它以一个数字开头。我设法匹配了此

的正则表达式
[_-0-9][a-zA-Z][a-zA-Z0-9-_]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME;

变量XXAB-Y无效,因为变量名不能以两个或多个x字符开头。

我试图为此匹配正则表达式,但我没有成功。我尝试了以下表达的各种组合,但没有任何作用。变量不断匹配为有效。

[X]{2,}[A-Z0-9-]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME;
[X]{2,0}[_-0-9][a-zA-Z][a-zA-Z0-9-_]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME;

lexer.l代码

[t ]+ // ignore whitespaces
n // Ignore new line
["][^"]*["] yylval.string = strdup(yytext); return TERM_STR;
";" return TERM_SEPARATOR;
"." return TERM_FULLSTOP;
[0-9]+ yylval.integer = atoi(yytext); return TERM_INT;
XX[A-Z0-9-]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME;
[_-0-9]+[a-zA-Z][a-zA-Z0-9-_]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME;
[A-Z][A-Z0-9-]* yylval.string = strdup(yytext); return TERM_VARIABLE_NAME;
[X]+ yylval.integer = yyleng; return TERM_SIZE;
. return TERM_INVALID_TOKEN;

parser.y代码

program:
    /* empty */ | 
    begin middle_declarations body grammar_s end {
        printf("nParsing completen");
        exit(0);
    };
begin:
    TERM_BEGINING TERM_FULLSTOP;
body:
    TERM_BODY TERM_FULLSTOP;
end:
    TERM_END TERM_FULLSTOP;
middle_declarations:
    /* empty */ |
    //Left recursive to allow for many declearations
    middle_declarations declaration TERM_FULLSTOP;
declaration:
    TERM_SIZE TERM_VARIABLE_NAME {
        createVar($1, $2);
    }
    |
    TERM_SIZE TERM_INVALID_VARIABLE_NAME {
        printInvalidVarName($2);
    };
grammar_s:
    /* empty */ |
    grammar_s grammar TERM_FULLSTOP;
grammar:
    add | move | print | input;
add:
    TERM_ADD TERM_INT TERM_TO TERM_VARIABLE_NAME {
        addIntToVar($2, $4);
    }
    |
    TERM_ADD TERM_VARIABLE_NAME TERM_TO TERM_VARIABLE_NAME {
        addVarToVar($2, $4);
    }
    ;
move:
    TERM_MOVE TERM_VARIABLE_NAME TERM_TO TERM_VARIABLE_NAME {
        moveVarToVar($2, $4);
    }
    |
    TERM_MOVE TERM_INT TERM_TO TERM_VARIABLE_NAME {
        moveIntToVar($2, $4);
    }
    ;
print:
    /* empty */ |
    TERM_PRINT rest_of_print {
        printf("n");
    };
rest_of_print:
    /* empty */ |
    rest_of_print other_print;
other_print:
    TERM_VARIABLE_NAME {
        printVarValue($1);
    }
    |
    TERM_SEPARATOR {
        // do nothing
    }
    |
    TERM_STR {
        printf("%s", $1);
    }
    ;
input:
    // Fullstop declares grammar
    TERM_INPUT other_input;
other_input:
    /* empty */ |
    // Input var1
    TERM_VARIABLE_NAME {
        inputValues($1);
    }
    |
    // Can be input var1; var2;...varN
    other_input TERM_SEPARATOR TERM_VARIABLE_NAME {
        inputValues($2);
    }
    ;

调试输出:

Starting parse
Entering state 0
Reading a token: Next token is token TERM_BEGINING (1.1: )
Shifting token TERM_BEGINING (1.1: )
Entering state 1
Reading a token: Next token is token TERM_FULLSTOP (1.1: )
Shifting token TERM_FULLSTOP (1.1: )
Entering state 4
Reducing stack by rule 3 (line 123):
   $1 = token TERM_BEGINING (1.1: )
   $2 = token TERM_FULLSTOP (1.1: )
-> $$ = nterm begin (1.1: )
Stack now 0
Entering state 3
Reducing stack by rule 6 (line 131):
-> $$ = nterm middle_declarations (1.1: )
Stack now 0 3
Entering state 6
Reading a token: Next token is token TERM_SIZE (1.1: )
Shifting token TERM_SIZE (1.1: )
Entering state 8
Reading a token: Next token is token TERM_VARIABLE_NAME (1.1: )
Shifting token TERM_VARIABLE_NAME (1.1: )
Entering state 13
Reducing stack by rule 8 (line 137):
   $1 = token TERM_SIZE (1.1: )
   $2 = token TERM_VARIABLE_NAME (1.1: )
-> $$ = nterm declaration (1.1: )
Stack now 0 3 6
Entering state 10
Reading a token: Next token is token TERM_FULLSTOP (1.1: )
Shifting token TERM_FULLSTOP (1.1: )
Entering state 15
Reducing stack by rule 7 (line 134):
   $1 = nterm middle_declarations (1.1: )
   $2 = nterm declaration (1.1: )
   $3 = token TERM_FULLSTOP (1.1: )
-> $$ = nterm middle_declarations (1.1: )
Stack now 0 3
Entering state 6
Reading a token: Next token is token TERM_SIZE (1.1: )
Shifting token TERM_SIZE (1.1: )
Entering state 8
Reading a token: Next token is token TERM_VARIABLE_NAME (1.1: )
Shifting token TERM_VARIABLE_NAME (1.1: )
Entering state 13
Reducing stack by rule 8 (line 137):
   $1 = token TERM_SIZE (1.1: )
   $2 = token TERM_VARIABLE_NAME (1.1: )
-> $$ = nterm declaration (1.1: )
Stack now 0 3 6
Entering state 10
Reading a token: Next token is token TERM_FULLSTOP (1.1: )
Shifting token TERM_FULLSTOP (1.1: )
Entering state 15
Reducing stack by rule 7 (line 134):
   $1 = nterm middle_declarations (1.1: )
   $2 = nterm declaration (1.1: )
   $3 = token TERM_FULLSTOP (1.1: )
-> $$ = nterm middle_declarations (1.1: )
Stack now 0 3
Entering state 6
Reading a token: Next token is token TERM_SIZE (1.1: )
Shifting token TERM_SIZE (1.1: )
Entering state 8
Reading a token: Next token is token TERM_VARIABLE_NAME (1.1: )
Shifting token TERM_VARIABLE_NAME (1.1: )
Entering state 13
Reducing stack by rule 8 (line 137):
   $1 = token TERM_SIZE (1.1: )
   $2 = token TERM_VARIABLE_NAME (1.1: )
-> $$ = nterm declaration (1.1: )
Stack now 0 3 6
Entering state 10
Reading a token: Next token is token TERM_FULLSTOP (1.1: )
Shifting token TERM_FULLSTOP (1.1: )
Entering state 15
Reducing stack by rule 7 (line 134):
   $1 = nterm middle_declarations (1.1: )
   $2 = nterm declaration (1.1: )
   $3 = token TERM_FULLSTOP (1.1: )
-> $$ = nterm middle_declarations (1.1: )
Stack now 0 3
Entering state 6
Reading a token: Next token is token TERM_BODY (1.1: )
Shifting token TERM_BODY (1.1: )
Entering state 7
Reading a token: Next token is token TERM_FULLSTOP (1.1: )
Shifting token TERM_FULLSTOP (1.1: )
Entering state 11
Reducing stack by rule 4 (line 126):
   $1 = token TERM_BODY (1.1: )
   $2 = token TERM_FULLSTOP (1.1: )
-> $$ = nterm body (1.1: )
Stack now 0 3 6
Entering state 9
Reducing stack by rule 10 (line 145):
-> $$ = nterm grammar_s (1.1: )
Stack now 0 3 6 9
Entering state 14
Reading a token: Next token is token TERM_PRINT (1.1: )
Shifting token TERM_PRINT (1.1: )
Entering state 20
Reducing stack by rule 22 (line 180):
-> $$ = nterm rest_of_print (1.1: )
Stack now 0 3 6 9 14 20
Entering state 34
Reading a token: Next token is token TERM_STR (1.1: )
Shifting token TERM_STR (1.1: )
Entering state 41
Reducing stack by rule 26 (line 194):
   $1 = token TERM_STR (1.1: )
-> $$ = nterm other_print (1.1: )
Stack now 0 3 6 9 14 20 34
Entering state 44
Reducing stack by rule 23 (line 182):
   $1 = nterm rest_of_print (1.1: )
   $2 = nterm other_print (1.1: )
-> $$ = nterm rest_of_print (1.1: )
Stack now 0 3 6 9 14 20
Entering state 34
Reading a token: Next token is token TERM_FULLSTOP (1.1: )
Reducing stack by rule 21 (line 176):
   $1 = token TERM_PRINT (1.1: )
   $2 = nterm rest_of_print (1.1: )
"hEllo"
-> $$ = nterm print (1.1: )
Stack now 0 3 6 9 14
Entering state 25
Reducing stack by rule 14 (line 150):
   $1 = nterm print (1.1: )
-> $$ = nterm grammar (1.1: )
Stack now 0 3 6 9 14
Entering state 22
Next token is token TERM_FULLSTOP (1.1: )
Shifting token TERM_FULLSTOP (1.1: )
Entering state 35
Reducing stack by rule 11 (line 147):
   $1 = nterm grammar_s (1.1: )
   $2 = nterm grammar (1.1: )
   $3 = token TERM_FULLSTOP (1.1: )
-> $$ = nterm grammar_s (1.1: )
Stack now 0 3 6 9
Entering state 14
Reading a token: Next token is token TERM_END (1.1: )
Shifting token TERM_END (1.1: )
Entering state 16
Reading a token: Next token is token TERM_FULLSTOP (1.1: )
Shifting token TERM_FULLSTOP (1.1: )
Entering state 27
Reducing stack by rule 5 (line 129):
   $1 = token TERM_END (1.1: )
   $2 = token TERM_FULLSTOP (1.1: )
-> $$ = nterm end (1.1: )
Stack now 0 3 6 9 14
Entering state 21
Reducing stack by rule 2 (line 113):
   $1 = nterm begin (1.1: )
   $2 = nterm middle_declarations (1.1: )
   $3 = nterm body (1.1: )
   $4 = nterm grammar_s (1.1: )
   $5 = nterm end (1.1: )

样本输入:

BeGiNInG.
X XXAB-.
XX XXX7.
XX XXXY.
BoDY.
print "hEllo".
EnD.
[X]{2,}[A-Z0-9-]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME;

应该很好,而且对我来说确实很好。但是,可以简化:

XX[A-Z0-9-]* yylval.string = strdup(yytext);return TERM_INVALID_VARIABLE_NAME;

由于任何进一步的 x 字符将与[A-Z0-9-]字符类匹配。(请注意,如果它是字符类中的第一件事或最后一件事,则不必在字符类中写--会这样。)

该模式(像您一样)也只匹配XX,但是[X]+模式将赢得胜利,因为它在Flex输入文件中较早。

{2,0}不是有效的间隔表达式,因为0小于2。要指定" 2或更多X",请写X{2,}(或[X]{2,},如果您喜欢的话。"X"{2,}也有效。)应该从flex产生错误消息。结果,没有产生词汇扫描仪。(但是,一个旧的可能仍在周围,这可能会造成混乱。)

相关内容

最新更新