我正在使用DOMDocument和XPath。
给定给以下 XML
<Description>
<CompleteText>
<DetailTxt>
<Text>
<span>Here there is some text</span>
<h2>And maybe a headline</h2>
<br/>
<span>Normal position</span>
<br/>
<span> </span>
<br/>
</Text>
</DetailTxt>
</CompleteText>
</Description>
节点/Description/CompleteText/DetailTxt/Text
包含标记,不幸的是没有转义,但我无法更改它。我是否有机会查询维护html 标记的内容?
我尝试了什么
显然,nodeValue还有textContent。两者都给了我内容省略标记。
您可以使用DOMDocument
的saveHTML
方法将节点序列化为 HTML,在您的情况下,您似乎希望在所选节点的每个子节点上调用它并连接字符串; 在浏览器 DOM API 中,这将被称为innerHTML
所以我编写了一个该名称的函数来执行此操作,并且还使用了从以下代码片段中从 XPath 调用 PHP 函数的功能:
<?php
$xml = <<<'EOD'
<Description>
<CompleteText>
<DetailTxt>
<Text>
<span>Here there is some text</span>
<h2>And maybe a headline</h2>
<br/>
<span>Normal position</span>
<br/>
<span> </span>
<br/>
</Text>
</DetailTxt>
</CompleteText>
</Description>
EOD;
$doc = new DOMDocument();
$doc->loadXML($xml);
$xpath = new DOMXPath($doc);
function innerHTML($nodeList) {
$node = $nodeList[0];
$html = '';
$containingDoc = $node->ownerDocument;
foreach ($node->childNodes as $child) {
$html .= $containingDoc->saveHTML($child);
}
return $html;
}
$xpath->registerNamespace("php", "http://php.net/xpath");
$xpath->registerPHPFunctions("innerHTML");
$innerHTML = $xpath->evaluate('php:function("innerHTML", /Description/CompleteText/DetailTxt/Text)');
echo $innerHTML;
输出 http://sandbox.onlinephpfunctions.com/code/62a980e2d2a2485c2648e16fc647a6bd6ff5620b 为
<span>Here there is some text</span>
<h2>And maybe a headline</h2>
<br>
<span>Normal position</span>
<br>
<span> </span>
<br>
我发现使用 DOMNode 的 C14n 方法是一个很好的结果。
http://sandbox.onlinephpfunctions.com/code/90dc915c9a43c91d31fcd47d37e89df430951b2e
<?php
$xml = <<<'EOD'
<Description>
<CompleteText>
<DetailTxt>
<Text>
<span>Here there is some text</span>
<h2>And maybe a headline</h2>
<br/>
<span>Normal position</span>
<br/>
<span> </span>
<br/>
</Text>
</DetailTxt>
</CompleteText>
</Description>
EOD;
$doc = new DOMDocument();
$doc->loadXML($xml);
$xpath = new DOMXPath($doc);
function innerHTML($nodeList) {
$node = $nodeList[0];
$html = '';
$containingDoc = $node->ownerDocument;
foreach ($node->childNodes as $child) {
$html .= $containingDoc->saveHTML($child);
}
return $html;
}
$xpath->registerNamespace("php", "http://php.net/xpath");
$domNodes = $xpath->query('/Description/CompleteText/DetailTxt/Text');
$domNode = $domNodes[0];
$innerHTML = $domNode->C14N();
echo $innerHTML;
结果
<Text>
<span>Here there is some text</span>
<h2>And maybe a headline</h2>
<br></br>
<span>Normal position</span>
<br></br>
<span> </span>
<br></br>
</Text>
在某种程度上似乎更短,你怎么看?不过,我需要摆脱节点。 也感谢您将我指向PHP沙盒。
更新
我意识到,C14N(( 更改了标记。请参阅<br />
至<br></br>
。