我正在寻找一种基于正则表达式的XML节点动态包围部分文本的方法。
考虑以下示例
<speak>The test number is 123456789, and some further block of text.</speak>
现在,假设我有一个针对数字的正则表达式,有选择地用新标签将其包围,因此它将变为:
<speak>The test number is <say-as interpret-as="characters">123456789</say-as>, and some further block of text.</speak>
我想过使用 DomDocument 来创建标签,但不确定替换部分。 有什么建议吗?
DOM是正确的方法。它允许您查找和遍历文本节点。在这些节点的内容上使用正则表达式,并将新节点构建为片段。
function wrapMatches(DOMNode $node, string $pattern, string $tagName, $tagAttributes = []) {
$document = $node instanceof DOMDocument ? $node : $node->ownerDocument;
$xpath = new DOMXpath($document);
// iterate all descendant text nodes
foreach ($xpath->evaluate('.//text()', $node) as $textNode) {
$content = $textNode->textContent;
$found = preg_match_all($pattern, $content, $matches, PREG_OFFSET_CAPTURE);
$offset = 0;
if ($found) {
// fragments allow to treat multiple nodes as one
$fragment = $document->createDocumentFragment();
foreach ($matches[0] as $match) {
list($matchContent, $matchStart) = $match;
// add text from last match to current
$fragment->appendChild(
$document->createTextNode(substr($content, $offset, $matchStart - $offset))
);
// add wrapper element, ...
$wrapper = $fragment->appendChild($document->createElement($tagName));
// ... set its attributes ...
foreach ($tagAttributes as $attributeName => $attributeValue) {
$wrapper->setAttribute($attributeName, $attributeValue);
}
// ... and add the text content
$wrapper->textContent = $matchContent;
$offset = $matchStart + strlen($matchContent);
}
// add text after last match
$fragment->appendChild($document->createTextNode(substr($content, $offset)));
// replace the text node with the new fragment
$textNode->parentNode->replaceChild($fragment, $textNode);
}
}
}
$xml = <<<'XML'
<speak>The test number is 123456789, and some further block of text.</speak>
XML;
$document = new DOMDocument();
$document->loadXML($xml);
wrapMatches($document, '(d+)u', 'say-as', ['interpret-as' => 'characters']);
echo $document->saveXML();
使用 XSLT 2.0 中的xsl:analyze-string
指令可以方便地处理此问题。例如,您可以定义规则:
<xsl:template match="speak">
<xsl:analyze-string select="." regex="d+">
<xsl:matching-substring>
<say-as interpret-as="characters">
<xsl:value-of select="."/>
</say-as>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:template>
你可以使用这样的preg_replace:
$str = '<speak>The test number is 123456789, and some further block of text.</speak>';
echo preg_replace('/(d+)/','<say-as interpret-as="characters">$1</say-as>',$str);
输出将是:
<speak>The test number is <say-as interpret-as="characters">123456789</say-as>, and some further block of text.</speak>
我最终以简单的方式完成了它,因为我不需要处理嵌套节点和其他特定于 XML 的东西。所以只是做了一个简单的方法来将标签创建为字符串。这已经足够好了。
protected function createTag($name, $attributes = [], $content = null)
{
$openingTag = '<' . $name;
if ($attributes) {
foreach ($attributes as $attribute => $value) {
$openingTag .= sprintf(' %s="%s"', $attribute, $value);
}
}
$openingTag .= '>';
$closingTag = '</' . $name . '>';
$content = $content ?: '$1';
return $openingTag . $content . $closingTag;
}
$tag = $this->createTag($tagName, $attributes);
$text = preg_replace($regex, $tag, $text);