我试图使用PHP的DOMDocument和XPath在<span>
中包装某些短语的所有实例。我的逻辑是基于另一个帖子的答案,但这只允许我选择节点内的第一个匹配,当我需要选择所有匹配时。
一旦我为第一个匹配修改了DOM,我的后续循环就会导致一个错误,在$after
所在的行声明Fatal error: Uncaught Error: Call to a member function splitText() on bool
。我很确定这是由修改标记引起的,但我一直无法找出原因。
我在这里做错了什么?
/**
* Automatically wrap various forms of CCJM in a class for branding purposes
*
* @link https://stackoverflow.com/a/6009594/654480
*
* @param string $content
* @return string
*/
function ccjm_branding_filter(string $content): string {
if (! (is_admin() && ! wp_doing_ajax()) && $content) {
$DOM = new DOMDocument();
/**
* Use internal errors to get around HTML5 warnings
*/
libxml_use_internal_errors(true);
/**
* Load in the content, with proper encoding and an `<html>` wrapper required for parsing
*/
$DOM->loadHTML("<?xml encoding='utf-8' ?><html>{$content}</html>", LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
/**
* Clear errors to get around HTML5 warnings
*/
libxml_clear_errors();
/**
* Initialize XPath
*/
$XPath = new DOMXPath($DOM);
/**
* Retrieve all text nodes, except those within scripts
*/
$text = $XPath->query("//text()[not(parent::script)]");
foreach ($text as $node) {
/**
* Find all matches, including offset
*/
preg_match_all("/(C.? ?C.?(?:JM| Johnson (?:&|&|&|and) Malhotra)(?: Engineers, LTD.?|, P.?C.?)?)/i", $node->textContent, $matches, PREG_OFFSET_CAPTURE);
/**
* Wrap each match in appropriate span
*/
foreach ($matches as $group) {
foreach ($group as $key => $match) {
/**
* Determine the offset and the length of the match
*/
$offset = $match[1];
$length = strlen($match[0]);
/**
* Isolate the match and what comes after it
*/
$word = $node->splitText($offset);
$after = $word->splitText($length);
/**
* Create the wrapping span
*/
$span = $DOM->createElement("span");
$span->setAttribute("class", "__brand");
/**
* Replace the word with the span, and then re-insert the word within it
*/
$word->parentNode->replaceChild($span, $word);
$span->appendChild($word);
break; // it always errors after the first loop
}
}
}
/**
* Save changes, remove unneeded tags
*/
$content = implode(array_map([$DOM->documentElement->ownerDocument, "saveHTML"], iterator_to_array($DOM->documentElement->childNodes)));
}
return $content;
}
add_filter("ccjm_final_output", "ccjm_branding_filter");
示例内容(所有"约翰逊,Malhotra古滑坡体!"one_answers";CCJM",但只有第一个可以成功修改):
C.C. Johnson & Malhotra, P.C. (CCJM) was an integral member of a large Design Team for a 16.5-mile-long Public-Private Partnership (P3) Purple Line Project. The east-west light rail system extends from New Carrollton in PG County, MD to Bethesda in MO County, MD with 21 stations and one short tunnel. CCJM was Engineer of Record (EOR) for the design of eight (8) Bridges and design reviews for 35 transit/highway bridges and over 100 retaining walls of different lengths/types adjacent to bridges and in areas of cut/fill. CCJM designed utility structures for 42,000 LF of relocated water mains and 19,000 LF of relocated sewer mains meeting Washington Suburban Sanitary Commission (WSSC), Md Dept of Transportation (MDOT) MTA, and Local Standards.
编辑1:做一些测试,当我输出$node->textContent
时,我看到它在第一个循环后发生变化…所以我认为发生的事情是,在我做$node->splitText($offset)
之后,它实际上是在更新整个节点,所以随后的偏移不起作用。
首先,我不认为foreach ($matches as $group)
在这里是正确的-如果你检查什么$matches包含,那是相同的匹配两次,但你可能不想把它们包装成跨度两次。因此foreach循环应该被删除,而下面的循环应该只在$matches[0]
上执行。
第二,我认为你的偏移问题可以简单地解决,如果你只是"向后上马"。-不要从头到尾替换找到的匹配项,而是按照相反的顺序替换。那么你将永远只是在操纵背后的结构。当前位置,因此无论那里发生什么变化,都不会影响先前匹配的位置。
/**
* Wrap each match in appropriate span
*/
//foreach ($matches as $group) {
$group = array_reverse($matches[0]);
foreach ($group as $key => $match) {
/**
* Determine the offset and the length of the match
*/
$offset = $match[1];
$length = strlen($match[0]);
/**
* Isolate the match and what comes after it
*/
$word = $node->splitText($offset);
$after = $word->splitText($length);
/**
* Create the wrapping span
*/
$span = $DOM->createElement("span");
$span->setAttribute("class", "__brand");
/**
* Replace the word with the span, and then re-insert the word within it
*/
$word->parentNode->replaceChild($span, $word);
$span->appendChild($word);
//break; // it always errors after the first loop
}
//}
结果我得到与您的样本输入数据如下(现场示例在这里,https://3v4l.org/kbSQ8)
<p><span class="__brand">C.C. Johnson & Malhotra, P.C.</span> (<span
class="__brand">CCJM</span>) was an integral member of a large Design Team
for a 16.5-mile-long Public-Private Partnership (P3) Purple Line Project.
The east-west light rail system extends from New Carrollton in PG County,
MD to Bethesda in MO County, MD with 21 stations and one short tunnel.
<span class="__brand">CCJM</span> was Engineer of Record (EOR) for the
design of eight (8) Bridges and design reviews for 35 transit/highway
bridges and over 100 retaining walls of different lengths/types adjacent to
bridges and in areas of cut/fill. <span class="__brand">CCJM</span>
designed utility structures for 42,000 LF of relocated water mains and
19,000 LF of relocated sewer mains meeting Washington Suburban Sanitary
Commission (WSSC), Md Dept of Transportation (MDOT) MTA, and Local
Standards.</p>