我从URL中得到了一个HTML。我想要实现的是只获取div中的纯文本内容。结构将类似于此
<div class="first">
<div class="second">
Some content inside second div
<div class="third">
Some more content inside third div
</div>
</div>
</div>
当我提取内容时,我想在数组中提取纯文本内容,比如这个
Array(
[first]=>
[second]=>Some content inside second div
[third]=>Some more content inside third div
);
我试图使用strip_tag来实现这一点,但不知何故,我对将其拆分并添加到数组感到困惑。任何有想法的人请帮忙。
<?php
function clearArray($arr) {
if(is_array($arr)) {
foreach($arr as $element) {
$cont=trim($element); //make sure to have cr/lf parts removed (difference between line seperator)
if(!empty($cont)) {
$newArray[]=$cont;
}
}
return $newArray;
}
return false;
}
$content='<div class="first">
<div class="second">
Some content inside second div
<div class="third">
Some more content inside third div
</div>
</div>
</div>';
$strippedContent=strip_tags($content);
$content=explode("n", $strippedContent);
$content=clearArray($content);
print_r($content);
这将输出:
Array ( [0] => Some content inside second div [1] => Some more content inside third div )
如果您从外部页面检索这些信息,我强烈建议您使用DOMDocument和xpath来获取元素。