有效的Regexp匹配从字符串中的给定索引开始

我已经解析了一个字符串，直到索引idx。我的下一个解析步骤使用Regexp。它需要匹配字符串的下一部分，即从位置idx开始。我如何有效地做到这一点？

例如：

let myString = "<p>ONE</p><p>TWO</p>"
let idx
// some code not shown here parses the first paragraph
// and updates idx
idx = 10
// next parse step must continue from idx 
let myRegex = /<p>[^<]*</p>/
let subbed = myString.substring(idx)
let result = myRegex.exec(subbed)
console.log(result) // "<p>TWO</p>", not "<p>ONE</p>"

但是myString.substring(idx)似乎是一个相当昂贵的操作。

是否没有类似这样的regex操作：result = myRegex.execFromIndex(idx, myString);？

通常，我想从不同的索引开始regex匹配，这样我就可以排除字符串的部分，避免已经解析的匹配。所以一次它可以来自myString[0]，另一次是myString[51]，依此类推

有没有办法有效地做到这一点？我正在解析数十万行，并希望以尽可能便宜的方式来完成这项工作。

JavaScript Regexp有一个lastIndex属性，该属性在Regexp.exec()中用作占位符，其中包含最后一个匹配的索引，表明它知道下一个从哪里开始。因此，设置myRegex.lastIndex = 3应该可以解决您的问题。

它比子字符串方法更有效，因为它不需要创建额外的变量，并且设置lastIndex属性可能比执行子字符串更快。其他一切都和你做的一模一样。

下面是一个测试，因为它表明设置lastIndex将产生与首先执行substring相同的结果。

var result1Elem = document.getElementById('result1');
var result2Elem = document.getElementById('result2');
var runBtn = document.getElementById('RunBtn');
runBtn.addEventListener("click", runTest);
function runTest() {
var substrStart = +document.getElementById('substrStartText').value
var myRegex1 = new RegExp(document.getElementById('regexText').value, 'g');
myRegex1.lastIndex = substrStart;
var myRegex2 = new RegExp(document.getElementById('regexText').value, 'g');
var myString1 = document.getElementById('testText').value;
var myString2 = myString1.substring(3);

var result;

var safety = 0;
while ((result = myRegex1.exec(myString1)) !== null) {
result1Elem.innerHTML += '<li>' + result[0] + ' at ' + result.index + '</li>';
if (safety++ > 50) break;
}

safety = 0;
while ((result = myRegex2.exec(myString2)) !== null) {
result2Elem.innerHTML += '<li>' + result[0] + ' at ' + (result.index + substrStart)  + '</li>';
if (safety++ > 50) break;
}
}

<table>
<tr><td>Test </td><td> <input type="text" value="Hello World" id="testText" /></td></tr>
<tr><td>Regex </td><td> <input type="text" value="l." id="regexText" /></td></tr>
<tr><td>Substring Start </td><td> <input type="text" value="3" id="substrStartText" /></td></tr>
<tr><td colspan="2"><button id="RunBtn">Run</button></td></tr>
</table>
<table style="width:100%">
<tr style="font-weight:bold; background:#ccc">
<td>Results of Regex with lastIndex = 3</td>
<td>Results of string substringged</td>
</tr>
<tr>
<td><ul id="result1"></ul></td>
<td><ul id="result2"></ul></td>
</tr>
<table>

使用`Regexp.exec`和`lastIndex`

使用y或g标志创建Regexp
- 使用y标志时，匹配必须恰好在指定的起始索引处开始
- 使用g标志，匹配可以发生在指定索引之后的任何位置
将其lastIndex属性设置为起始索引
致电exec

我已经将以上步骤应用于您的示例代码：

let myString = "<p>ONE</p><p>TWO</p>"
let idx
// some code not shown here parses the first paragraph
// and updates idx
idx = 10
// next parse step must continue from idx 
let myRegex = /<p>[^<]*</p>/y  // 🚩note the 'y' flag!🚩
myRegex.lastIndex = idx
let result = myRegex.exec(myString)
console.log(result) // "<p>TWO</p>", not "<p>ONE</p>"

需要知道的另一件有用的事情是，exec将更新lastIndex，使其指向字符串中返回的匹配之后的位置。这允许你做很多事情，包括：

重新运行相同的Regexp，它将自动在上一个匹配之后找到下一个匹配
如果要解析的下一个内容具有不同的模式，请将lastIndex值转移到不同的Regexp
将lastIndex值复制到非正则表达式解析所使用的变量中
将lastIndex返回给函数的调用者，这样调用者就可以随心所欲地处理字符串的其余部分

为什么`string.slice`和`substring`也是好的解决方案

但myString.substring(idx)似乎是一项相当昂贵的操作。

不一定！尽管它们可能不会像Rust那样快，但所有领先的Javascript引擎(SpiderMonkey、V8、JavaScriptCore)都完全按照您对Rust的描述来做。他们在后台优化string.slice和substring，使用指向源字符串的指针而不是复制。

《在子字符串和RegExps的土地上冒险》有很多很棒的细节、图片和分析，但它已经有五年的历史了，从那以后情况可能会变得更好。这个StackOverflow问题的答案是：Javascript子字符串是虚拟的吗？

使用`Regexp.exec`和`lastIndex`

为什么`string.slice`和`substring`也是好的解决方案

相关内容

最新更新

热门标签：

有效的Regexp匹配从字符串中的给定索引开始

使用Regexp.exec和lastIndex

为什么string.slice和substring也是好的解决方案

相关内容

最新更新

热门标签：

使用`Regexp.exec`和`lastIndex`

为什么`string.slice`和`substring`也是好的解决方案