PHP -在字符串中搜索关键字并提高提取关键字的质量和准确性



我有一段PHP代码如下:

$Keywords = array(
            ', JOE.' => '1',
            ', JOE' => '2',
            'JOE' => '3',
            'JOE.' => '4',
            '/JOE' => '5',
            '/JOE/' => '6',
            'JOE/.' => '7',
            ',JOE.' => '8'
    );
$Text = "JOE is JOE is JOE is JOE is JOE is JOE is JOE. Hello , JOE. Hey ,JOE. Come on , JOE. Dude,JOE/. Shut up ,JOE. What is the meaning of /JOE/? Of course, JOE";
extract_keyword ($Keywords, $Text);
function extract_keyword ($Keywords, $Text){
    mb_internal_encoding('UTF-8');
    uksort($Keywords, function ($a, $b) {
        $as = mb_strlen($a);
        $bs = mb_strlen($b);
        if ($as > $bs) {
            return -1;
        }
        else if ($bs > $as) {
            return 1;
        }
        return 0;
    });
    $Keywords_ci = array();
    foreach ($Keywords as $k => $v) {
        $Keywords_ci[$k] = $v;
    }
    $re = '/b(?:' . join('|', array_map(function($keyword) {
        return preg_quote($keyword, '/');
    }, array_keys($Keywords))) . ')b/i';
    $KeywordArrayKey = array();
    $KeywordArrayValue = array();
    $NewArray = array();
    preg_match_all($re, $Text, $matches);
    foreach ($matches[0] as $keyword) {
        $KeywordArrayKey[] = $keyword;
        $KeywordArrayValue[] = $Keywords_ci[$keyword];
        if(!empty($keyword) && !empty($Keywords_ci[$keyword])) {
        $NewArray[] = array($keyword => $Keywords_ci[$keyword]); 
        }
    } 
    print_r($NewArray) ."<br><br>";
}

代码回显如下:

Array ( 
[0] => Array ( [JOE] => 3 ) 
[1] => Array ( [JOE] => 3 ) 
[2] => Array ( [JOE] => 3 ) 
[3] => Array ( [JOE] => 3 ) 
[4] => Array ( [JOE] => 3 ) 
[5] => Array ( [JOE] => 3 ) 
[6] => Array ( [JOE] => 3 ) 
[7] => Array ( [JOE] => 3 ) 
[8] => Array ( [JOE] => 3 ) 
[9] => Array ( [JOE] => 3 ) 
[10] => Array ( [JOE] => 3 ) 
[11] => Array ( [JOE] => 3 ) 
[12] => Array ( [JOE] => 3 ) 
[13] => Array ( [, JOE] => 2 ) )

正如您所看到的,问题是代码不够精确,无法提取有关键字的$keywords,例如', JOE.' => '1' or 'JOE/.' => '7'。事实上,我的目标是准确地将'/JOE' => '5''/JOE/' => '6''JOE.' => '4'等分开。你能看一下代码,让我知道如何提高提取关键字的质量/准确性吗?谢谢你的帮助。

注1:print_r($Keywords_ci);打印Array ( [, JOE.] => 1 [JOE/.] => 7 [,JOE.] => 8 [, JOE] => 2 [/JOE/] => 6 [JOE.] => 4 [/JOE] => 5 [JOE] => 3 ),但我想要的是回显$Text中可用关键字的所有实例,如'/JOE/' => '6'',JOE.' => '8'

注2:以下是print_r($NewArray)的预期打印:

Array ( 
[0] => Array ( [JOE] => 3 ) 
[1] => Array ( [JOE] => 3 ) 
[2] => Array ( [JOE] => 3 ) 
[3] => Array ( [JOE] => 3 ) 
[4] => Array ( [JOE] => 3 ) 
[5] => Array ( [JOE] => 3 ) 
[6] => Array ( [JOE.] => 4 ) 
[7] => Array ( [, JOE.] => 1 ) 
[8] => Array ( [,JOE.] => 8 ) 
[9] => Array ( [, JOE.] => 1 ) 
[10] => Array ( [JOE/.] => 7 ) 
[11] => Array ( [,JOE.] => 8 ) 
[12] => Array ( [/JOE/] => 6 ) 
[13] => Array ( [, JOE] => 2 ) )

将关键字从长到短排序后,您知道将在该字符串的任何可能子集之前检查字符串(/JOE/before/JOE)。因此,您可以使用str_replace删除实际匹配项,因此在搜索/JOE时不匹配/JOE/(假设您之前搜索过/JOE/)。使用str_replace的count参数获取匹配项的计数

<?php
$Keywords = array(
            ', JOE.' => '1',
            ', JOE' => '2',
            'JOE' => '3',
            'JOE.' => '4',
            '/JOE' => '5',
            '/JOE/' => '6',
            'JOE/.' => '7',
            ',JOE.' => '8'
    );
$Text = "JOE is JOE. Hello , JOE. Hey ,JOE. Come on , JOE. Dude,JOE/. Shut up ,JOE. What is the meaning of /JOE/? Of course, JOE";
uksort($Keywords, function ($a, $b) {
        $as = mb_strlen($a);
        $bs = mb_strlen($b);
        if ($as > $bs) {
            return -1;
        }
        else if ($bs > $as) {
            return 1;
        }
        return 0;
    });
$copy = $Text;
foreach ($Keywords as $keyword => $value) {
   $copy = str_replace($keyword, '', $copy, $count);
   if ($count > 0) {
       $result[$keyword] = $value;
   }
}
print_r($result);

您可以使用$count变量来实际计算字符串出现的次数。

最新更新