如何使这个句子拆分器regex-safari兼容



我使用这个问题中被接受作为答案的正则表达式来拆分句子,但该正则表达式在safari中不兼容,因为它(还(不支持负lookbehinds。

(?<!w.w.)(?<![A-Z][a-z].)(?<=.|?)s

正则表达式拆分以下字符串:

先生。史密斯花了150万美元买了便宜网站,也就是说他花了很多钱。他介意吗?小亚当·琼斯认为他没有。无论如何,这不是真的。。。好吧,概率是.9。

进入:

[
"Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot for it."
"Did he mind?"
"Adam Jones Jr. thinks he didn't."
"In any case, this isn't true..."
"Well, with a probability of .9 it isn't."
]

它基本上是从字符串中提取句子。

关于如何使其与safari兼容,有什么想法吗?

此正则表达式可以重写以在match/matchAllexec:中使用

/((?:[A-Z][a-z].|w.w[wW]|[wW])*?[.!?])(?:s+|$)/g

请参阅regex演示详细信息

  • ((?:[A-Z][a-z].|w.w[wW]|[wW])*?[.!?])-第1组:
    • (?:[A-Z][a-z].|w.w[wW]|[wW])*?-零次或多次出现
      • [A-Z][a-z].|-大写、小写字母,然后是.,或
      • w.w[wW]|-单词字符,.,单词字符,然后是任何一个字符
      • [wW]-任意单个字符
    • [.!?]-.!?
  • (?:s+|$)-一个或多个空白或字符串末尾

查看JavaScript演示:

var s = "Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot for it. Did he mind? Adam Jones Jr. thinks he didn't. In any case, this isn't true... Well, with a probability of .9 it isn't."
var rx = /((?:[A-Z][a-z].|w.w[wW]|[wW])*?[.!?])(?:s+|$)/g
var results = [], m;
while(m=rx.exec(s)) {
results.push(m[1]);
}
console.log(results)

好的,所以基于Wiktor Stribiżew(非常感谢(给出的正则表达式,我可以做这样的事情。

const string = "My perfect string"
const regex = /((?:[A-Z][a-z].|w.w.|.)*?(?:[.!?]|$))(?:s+|$)/g
const sentences = string.match(regex)

这将给出句子,但请注意,我需要删除数组中的一个元素,因为它返回一个空值。

[
'Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot for it. ',
'Did he mind? ',
"Adam Jones Jr. thinks he didn't. ",
"In any case, this isn't true... ",
"Well, with a probability of .9 it isn't.",
''
]

然后,通过从数组中移除最后一个元素,我们可以得到预期的结果。

sentences.pop()

如果你需要去掉句子末尾的空格,你可以把它们修剪成

let sentences = string.match(regex).map(sentence => {
return sentence.trim()
})

最新更新