具有特定大小和边界检测的句子



这是我的问题:我有一个大字符串(接近8000个字符),我想要两样东西:

  1. 检测句子边界,如"."以及
  2. 有不超过600个字符的句子

我知道,在某些情况下,两者不可能兼得。在这种情况下,找一个空格并拆分句子。

ridgerrunner 针对条件1的此解决方案非常有效,请参阅原始链接(http://goo.gl/PqI6d),但它通常输出大于600个字符的句子。有灯光吗??提前感谢!

您最好匹配字符串。匹配的正则表达式可能如下所示:

(.{0,600}?.)|(.{0,600}(?= ))

简而言之,您首先要在一个句点之前查找尽可能小的字符串。如果没有,则查找尽可能长的字符串,后面跟着一个空格。然后下一场比赛将从你停止的地方开始。

请注意,这是通用正则表达式。您的php实现可能会有所不同。

Tks nhahtdh。请看我是否遗漏了什么。下面是我的字符串的摘录和使用您的建议的输出。

<?php 
    $ptn = "/(?:[^.]{1,600}(?: |.)|w{600,}(?: |.)?)/";
    $str = "Amblyopia occurs when the nerve pathway from one eye to the brain does not develop during childhood. This occurs because the abnormal eye sends a blurred image or the wrong image to the brain. This confuses the brain, and the brain may learn to ignore the image from the weaker eye. Strabismus is the most common cause of amblyopia. There is often a family history of this condition. The term "lazy eye" refers to amblyopia, which often occurs along with strabismus. However, amblyopia can occur without strabismus and people can have strabismus without amblyopia.First, any eye condition that is causing poor vision in the amblyopic eye (such as cataracts) needs to be corrected. Children with a refractive error (nearsightedness, farsightedness, or astigmatism) will need glasses. Next, a patch is placed on the normal eye. This forces the brain to recognize the image from the eye with amblyopia. Sometimes, drops are used to blur the vision of the normal eye instead of putting a patch on it. Children whose vision will not fully recover, and those with only good eye due to any disorder should wear glasses with protective polycarbonate lenses. Polycarbonate glasses are shatter- and scratch-resistant. Children who get treated before age 5 will usually recover almost completely normal vision, although they may continue to have problems with depth perception. Delaying treatment can result in permanent vision problems. After age 10, only a partial recovery of vision can be expected. Early recognition and treatment of the problem in children can help to prevent permanent visual loss. All children should have a complete eye examination at least once between ages 3 and 5. Special techniques are needed to measure visual acuity in a child who is too young to speak. Most eye care professionals can perform these techniques.";
    preg_split($ptn, $str, -1, PREG_SPLIT_NO_EMPTY);
    print_r($result);
    ?>

结果:我需要字符串中小于600个字符的句子

 Array
(
[0] => childhood.
[1] => brain.
[2] => eye.
[3] => amblyopia.
[4] => condition.
[5] => strabismus.
[6] => amblyopia.
[7] => corrected.
[8] => glasses.
[9] => eye.
[10] => amblyopia.
[11] => it.
[12] => lenses.
[13] => scratch-resistant.
[14] => perception.
[15] => problems.
[16] => expected.
[17] => loss.
[18] => 5.
[19] => speak.
[20] => techniques
)

最新更新