Regex从HTML代码的开头获取文本

我正在尝试在HTML字符串中查找某些单词。标准如下：

单词在^的开头
这个词在中间，在它前面有一个空格
这个词在标签后面的开头

我能够获得前两个标准，但未能获得第三个标准。

示例字符串：

Leading a team of 5.
You will be leading a team of 5
<span style="color:#f0f;">Leading a team of 5</span>
The code is ok
He is a good coder

结果应该是：[Leading, leading, Leading, He]

我当前的正则表达式：

/(?:^|s)(lead[a-z]{0,}|he[s])/gi

我正在使用替换来丰富单词，例如：

text.replace(regex, `<b>$1</b>`);

我不知道如何只得到这个词。

我知道我可以删除(?:^|s)部分，但这会影响像he这样的小字，因为它将与the, The ... etc匹配

您可以使用：

(?:^(?:<[^>]*>)?|s)(he|lead[a-z]*)b

图案与匹配

(?:非捕获组
- ^(?:<[^>]*>)?字符串的开始，可选地匹配类似标签的模式(假设在结束>之前没有>个字符
- |或
- s匹配空白字符
)关闭捕获组
(he|lead[a-z]*)匹配he或lead，后跟可选字符a-z
b防止部分匹配的字边界

Regex演示

const regex = /(?:^(?:<[^>]*>)?|s)(he|lead[a-z]*)b/gi;
[
"Leading a team of 5.",
"You will be leading a team of 5",
"<span style="color:#f0f;">Leading a team of 5</span>lead",
"The code is ok",
"He is a good coder",
"test the lead test !@#$leadi and leading"
].forEach(s =>
console.log(`${s} ==> ${Array.from(s.matchAll(regex), m => m[1])}`)
);

相关内容

最新更新

热门标签：