如何在不包括HTML标记的搜索词周围裁剪文本



我有一个包含HTML和Text的字符串,还有一个搜索词。我想在$searchword周围剪一些文本。

示例文本:

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. <sometag>At</sometag> vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.

如果$searchword为"vero",则输出:

...sed diam voluptua. At <strong>vero</strong> eos et accusam et...

所以我希望在搜索词之前和之后有X个字符,不包括HTML。我不知道该怎么开始。我知道我们可能需要一个substr函数和一个regex,但我被卡住了。

// The string to search in
$text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis ex.';
// The text to search
$search_query = 'consectetur';
// The regular expression
// Note that I’m using preg_quote() to make sure the text doesn’t conflict with the regular expression
// This expression matches 3 words (punctuation included) before and after the searched keyword
$search = '/((w+[^w]+){3})(' . preg_quote($search_query) . ')(([^w]+w+){3})/i';
// Find all matches of the expression, and store it in $matches
preg_match($search, $text, $matches);
// Use the results to generate the string you desire.
$result = sprintf('...%s<strong>%s</strong>%s...', $matches[1], $matches[3], $matches[4]);

Tim的解决方案运行良好,但这里有一个稍微不同的解决方案,它匹配给定单词之前的m个字符和之后的n个字符,而不是n个单词:

$string = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. <sometag>At</sometag> vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.";
$string = strip_tags($string); // strip html tags
$word = 'vero';
$replace = "<strong>$word</strong>";
$before = 22; // characters to match before word
$after = 7; // characters to match after word
preg_match('/(.){'.$before.'}'.$word.'(.){'.$after.'}/', $string, $matches);
echo '...' . preg_replace('/'.$word.'/', '<strong>'.$word.'</strong>', $matches[0]) . '...';
// returns "...sed diam voluptua. At <strong>vero</strong> eos et..." for $before = 22 and $after = 7

步骤1:删除HTML标记。步骤2:包装搜索词的出现。

$text = 'Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. <sometag>At</sometag> vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.';
$plainText = strip_tags($text);
$resultText = str_replace($searchword, '<strong>' . $searchword . '</strong>', $plainText);

相关内容

最新更新