在PHP中，如何使str_word_count()识别所有UTF-8特殊字符作为单词

考虑以下程序：

 <?php
     $str='You & I = We';
     $arr=[];
     $arr=str_word_count($str,2,"&=");
     foreach($arr as $key=>$value){
               echo $key.'&nbsp;&nbsp;===>&nbsp;&nbsp;'.$value.'<br>';
     }
?>

输出：

0  ===>  You
4  ===>  &
6  ===>  I
8  ===>  =
10 ===>  We

现在考虑以下程序：

 <?php
     $str='You & I = We';
     $arr=[];
     $arr=str_word_count($str,2);
     foreach($arr as $key=>$value){
               echo $key.'&nbsp;&nbsp;===>&nbsp;&nbsp;'.$value.'<br>';
     }
?>

输出：

0  ===>  You
6  ===> I
10  ===> We

注意：

第一和第二个函数之间的区别是第一个函数中的第三个参数

"&="

存在，但在第二个功能中不是。

因此，第一函数将特殊字符 &和 =识别为单词，但第二个功能却没有。

现在考虑一种情况，我们的字符串有很多特殊字符。将所有这些都包括在第三个论点中可能是不切实际的。

所以这是我的问题：

是否有任何更简单的方法使str_word_count()函数识别所有UTF-8特殊字符作为单词，而不会遇到第三个参数中包含庞大数量的特殊字符的麻烦？

这是一种方法。

https://3v4l.org/r4ngg

正如我在评论中所写的那样，您可以使用爆炸和strpos((获取单词的单词和位置。
使用strpos((的第三个周长，偏移确保您没有不正确的单词的位置。
$ nextpos将永远是上一个单词的终点的位置，即使您两次重复相同的单词，它仍然会显示正确的位置。

$str ="this is a very very long text with some words repeating over and over & over again. When you use Explode() you will get an array with all the words. & using strpos( haystack, needle, & most importantly offset) you should get a good array with the positions of the words.";
$arrWords = explode(" ", $str);
$nextpos = 0;
$arrPos =array();
for ($i=0; $i <= count($arrWords)-1; $i++){
    $arrPos[$i]["Position"] = strpos($str, $arrWords[$i], $nextpos);
    $arrPos[$i]["Lenght"] = strlen($arrWords[$i]);
    $arrPos[$i]["Word"] = $arrWords[$i];
    $nextpos= $nextpos+strlen($arrWords[$i])+1;
}
var_dump($arrPos);

相关内容

最新更新

热门标签：