Powershell - 从字符串中提取句子的最佳方法是什么 - Powershell - What is the best way to extract sentences from string 小贝子编程网

我有很多行的文本，结构是这样的。

Sentence a. Sentence b part 1 `r`n
sentence b part 2. Sentence c.`r`n
Sentence d. Sentence e. Sentence f. `r`n
....

我想将这些句子和部分提取到每个部分或句子的单个字符串数组中。到目前为止，我找到了这些东西。

第一种方式。

$mySentences = $lineFromTheText -split "(?<=.)"

第二种方式。

$mySentences = [regex]::matches($lineFromTheText, "([^.?!]+[.?!])?([^.?!]*$)?") | % {$_.Groups[1,2].Value} | % { If (-not ($_ -eq "")) {$_}}

还有第三个代码。

$mySentences = ($lineFromTheText | Select-String -Pattern "([^.?!]+[.?!])?([^.?!]*$)?" -AllMatches).Matches  | % {$_.Groups[1,2].Value} | % { If (-not ($_ -eq "")) {$_}}

似乎所有这些代码都像我期望的那样为我做同样的事情，但我想知道自己在这么多中我应该使用哪种代码的方式。我的意思是什么是最好的代码。请告诉我。谢谢。

如果你想要最少的执行时间，你可以测量一下。让我们运行每个解决方案10000次，看看需要多长时间：

$lineFromTheText = "Sentence d. Sentence e. Sentence f."
(Measure-Command {1..10000 | % {$mySentences = $lineFromTheText -split "(?<=.)"}}).Ticks
(Measure-Command {1..10000 | % {$mySentences = [regex]::matches($lineFromTheText, "([^.?!]+[.?!])?([^.?!]*$)?") | % {$_.Groups[1,2].Value} | % { If (-not ($_ -eq "")) {$_}}}}).Ticks
(Measure-Command {1..10000 | % {$mySentences = ($lineFromTheText | Select-String -Pattern "([^.?!]+[.?!])?([^.?!]*$)?" -AllMatches).Matches  | % {$_.Groups[1,2].Value} | % { If (-not ($_ -eq "")) {$_}}}}).Ticks

输出(示例(：

1059468
14512767
20444350

看起来您的第一个解决方案是最快的，而您的第三个解决方案是最慢的。

Powershell - 从字符串中提取句子的最佳方法是什么

相关内容

最新更新

热门标签：