按单词边界划分,包括撇号



我想把字符串中的每个单词(包括空格和标点(分成不同的组,但我想把带撇号的单词放在一起。

例如:

Phrase: This is right.
Groups: [This] [ ] [is] [ ] [right] [.]
Phrase: This isn't right.
Groups: [This] [ ] [isn't] [ ] [right] [.]
Phrase: "I said ok."
Groups: ["] [I] [ ] [said] [ ] [ok] [.] ["]

我使用的是正则表达式:str.split(/(?=[.,"s]|b)/)

但是,这不适用于撇号。对于阶段:这是不对的,它像一样分裂

[This] [ ] [isn] ['] [t] [ ] [right] [.]

有没有办法将不是保持在一个组中?

您可以尝试在模式[A-Za-z']+|[^A-Za-z']上查找所有正则表达式匹配项,该模式匹配单词(字母或撇号(或单个非单词字符。

var regex = /[A-Za-z']+|[^A-Za-z']/g;
var input = ""This isn't right."";
var m;
var matches = [];
var i = 0;
do {
m = regex.exec(input);
if (m) {
matches[i] = m[0];
++i;
}
} while (m);
console.log(matches);

请注意,使用直接正则表达式查找所有方法有时比使用更复杂的正则表达式拆分逻辑更可取。

我会使用.match:匹配后面跟着(单词字符或撇号(的单词字符(w[w']*(,或匹配空格:+,或匹配其他标点符号([.,"](:

w[w']*| +|[.,"]

https://regex101.com/r/B755JA/1

const inputs = `This is right.
This isn't right.
"I said ok."`.split('n');
for (const input of inputs) {
console.log(input.match(/w[w']*| +|[.,"]/g));
}

最新更新