正如标题所说,我知道lua在lua的完整语法中有一个官方扩展的BNF。我想写一个PEG传递到lpeg.re.compile来解析lua本身。也许Lua-PEG有点像BNF。我读过BNF,并试图将其翻译成PEG,但我发现Numeral和LiteralString很难写。有人做过这样的事吗?
local lpeg = require "lpeg"
local re = lpeg.re
local p = re.compile([[
chunk <- block
block <- stat * retstat ?
stat <- ';' /
varlist '=' explist /
functioncall /
label /
'break' /
'goto' Name /
'do' block 'end' /
'while' exp 'do' block 'end' /
'repeat' block 'until' exp /
'if' exp 'then' block ('elseif' exp 'then' block) * ('else' block) ? 'end' /
'for' Name '=' exp ',' exp (',' exp) ? 'do' block 'end' /
'for' namelist 'in' explist 'do' block 'end' /
'function' funcname funcbody /
'local function' Name funcbody /
'local' attnamelist ('=' explist) ?
attnamelist <- Name attrib (',' Name attrib) *
attrib <- ('<' Name '>') ?
retstat <- 'return' explist ? ';' ?
label <- '::' Name '::'
funcname <- Name ('.' Name) * (':' Name) ?
varlist <- var (',' var) *
var <- Name / prefixexp '[' exp ']' / prefixexp '.' Name
namelist <- Name (',' Name) *
explist <- exp (',' exp) *
exp <- 'nil' / 'false' / 'true' / Numeral / LiteralString / "..." / functiondef /
prefixexp / tableconstructor / exp binop exp / unop exp
prefixexp <- var / functioncall / '(' exp ')'
functioncall <- prefixexp args / prefixexp ":" Name args
args <- '(' explist ? ')' / tableconstructor / LiteralString
functiondef <- 'function' funcbody
funcbody <- '(' parlist ? ')' block 'end'
parlist <- namelist (',' '...') ? / '...'
tableconstructor <- '{' fieldlist ? '}'
fieldlist <- field (fieldsep field) * fieldsep ?
field <- '[' exp ']' '=' exp / Name '=' exp / exp
fieldsep <- ',' / ';'
binop <- '+' / '-' / ‘*’ / '/' / '//' / '^' / '%' /
'&' / '~' / '|' / '>>' / '<<' / '..' /
'<' / '<=' / '>' / '>=' / '==' / '~=' /
'and' / 'or'
unop <- '-' / 'not' / '#' / '~'
saveword <- "and" / "break" / "do" / "else" / "elseif" / "end" /
"false" / "for" / "function" / "goto" / "if" / "in" /
"local" / "nil" / "no"t / "or" / "repeat" / "return" /
"then" / "true" / "until" / "while"
Name <- ! saveword / name
Numeral <-
LiteralString <-
]])
首先:您需要在两步过程中解析Lua,包括标记化(词法分析,RegEx(和语法分析。考虑语法无效的Lua代码if1then print()end
。如果你一次性解析它,你可能不会得到语法错误,因为理论上它可以合理地解释为if
-数字1
-then
…-然而,标记化将贪婪地使CCD_;标识符"/name令牌,在稍后的语法分析中触发语法错误。
在某些情况下,PEG可能允许通过其有序选择来表达这一点,但通常应采用两步过程,以避免获得过于宽松(可能存在歧义的语法(。
";规则";还有待编写的是所有的令牌规则(从大写的名称中可以看出(——Name
、LiteralString
和Numeral
。这些基本上只是简单的RegExes。至于Name
s:如果你巧妙地使用PEG的有序选择+
,你就不必使用";减法";(负前瞻(以避免关键字被解析为Name
s:只需在标记化语法中按照Token = Keyword + Name + ...
的行做一些事情。
文字字符串确实很棘手,因为长字符串不能写成RegExes;带引号的字符串相当容易(不过您必须处理转义(。LPeg文档中有一个关于长字符串的示例:
equals = lpeg.P"="^0
open = "[" * lpeg.Cg(equals, "init") * "[" * lpeg.P"n"^-1
close = "]" * lpeg.C(equals) * "]"
closeeq = lpeg.Cmt(close * lpeg.Cb("init"), function (s, i, a, b) return a == b end)
string = open * lpeg.C((lpeg.P(1) - closeeq)^0) * close / 1
数字有点笨拙,因为你必须处理不同碱基的许多不同情况,点的省略,点前后0的省略,指数,符号等。
我碰巧有相关的LPeg规则:
-- Character classes
_letter = R("AZ", "az")
_letter_ = _letter + P"_"
_digit = R"09"
_hexdigit = _digit + R("AF", "af")
white = C(S" ftvnr" ^ 1)
_keyword = P"not"
+ P"and"
+ P"or"
+ P"function"
+ P"nil"
+ P"false"
+ P"true"
+ P"return"
+ P"goto"
+ P"do"
+ P"end"
+ P"while"
+ P"repeat"
+ P"until"
+ P"if"
+ P"then"
+ P"elseif"
+ P"else"
+ P"for"
+ P"local"
-- Names
Name = C(_letter_ * (_letter_ + _digit) ^ 0) - _keyword
-- Numbers
local function _numeral(digit_, exponent_letter)
local uint = digit_ ^ 1
local float = uint + uint * P"." * uint + uint * P"." + P"." * uint
local exponent = exponent_letter * O(S"+-") * uint
return C(float) * C(O(exponent))
end
_hex_numeral = C(P"0x") * _numeral(_hexdigit, S"pP")
_decimal_numeral = _numeral(_digit, S"eE")
Numeral = _hex_numeral + _decimal_numeral
-- Strings
decimal_escape = C(_digit * O(_digit * O(_digit)))
hex_escape = P"x" * C(_hexdigit * _hexdigit)
unicode_escape = P"u{" * C(_hexdigit^1) * P"}"
char_escape = C(S[[abfnrtv'"]])
_escape = P[[]] * (decimal_escape + hex_escape + char_escape + unicode_escape)
local function string_quoted(quotes)
local range = P(1) - S(quotes .. "