如何使用javascript开发词法分析器

我开发了一个词法分析器函数，它获取一个字符串，并将字符串中的项分隔成这样的数组：

const lexer = (str) =>
str
.split(" ")
.map((s) => s.trim())
.filter((s) => s.length);
console.log(lexer("John Doe")) // outputs ["John" , "Doe"]

现在，我想用javascript开发一个词法分析器来分析类型，比如：

if (foo) {
bar();
}

并返回如下输出：

[
{
lexeme: 'if',
type: 'keyword',
position: {
row: 0,
col: 0
}
},
{
lexeme: '(',
type: 'open_paran',
position: {
row: 0,
col: 3
}
},
{
lexeme: 'foo',
type: 'identifier',
position: {
row: 0,
col: 4
}
},
...
]

如何使用javascript开发词法分析器来识别类型？

提前谢谢。

我在JavaScript中看到的最常见的词法分析模式(例如KaTeX和CoffeeScript(是定义一个包含您可能看到的所有标记的正则表达式，并以某种方式迭代该正则表达式的匹配项。

这里有一个简单的lexer，它涵盖了您的JavaScript示例(但也跳过了无效内容(：

const tokenRegExp = /[(){}n]|(w+)/g;
const tokenMap = {
'(': 'open_paren',
')': 'close_paren',
'{': 'open_brace',
'}': 'close_brace',
}
let row = 0, col = 0;
const tokens = [];
while (let match = tokenRegExp.exec(input)) {
let type;
if (match[1]) { // use groups to identify which part of the RegExp is matching
type = 'identifier';
} else if (tokenMap[match[0]]) { // use lookup table for simple tokens
type = tokenMap[match[0]];
}
if (type) {
tokens.push({
lexeme: match[0],
type,
position: {row, col},
});
}
// Update row and column number
if (match[0] === 'n') {
row++;
col = 0;
} else {
col += match[0].length;
}
}

其他解析器将使用正则表达式来匹配字符串的前缀，然后丢弃字符串的该部分，并从它停止的地方继续匹配。(这样可以避免跳过无效内容。(

不过，我不建议您编写自己的JavaScript lexer，除非是出于教育目的；有很多人可能会发现比你不费吹灰之力就能发现的更多的边缘案例。

相关内容

最新更新

热门标签：