简单HTML Dom:迭代时的异常



我必须解析以下html代码:

<ul>
<li><span><input id="testing_5" type="checkbox" name="filter" value="5"></span><label for="testing_5"><div>Label 1</div><span>579<span></label></li>
<li><span><input id="testing_4" type="checkbox" name="filter" value="4"></span><label for="testing_4"><div>Label 2</div><span>356<span></label></li>
<li><span><input id="testing_3" type="checkbox" name="filter" value="3"></span><label for="testing_3"><div>Label 3</div><span>109<span></label></li>
<li><span><input id="testing_2" type="checkbox" name="filter" value="2"></span><label for="testing_2"><div>Label 4</div><span>32<span></label></li>
<li><span><input id="testing_1" type="checkbox" name="filter" value="1"></span><label for="testing_1"><div>Label 5</div><span>13<span></label></li>
</ul>

我想在任何标签中打印代码,所以我写了一个简单的PHP脚本,如下所示:

$scrape_obj = str_get_html('<ul><li><span><input id="testing_5" type="checkbox" name="filter" value="5"></span><label for="testing_5"><div>Label 1</div><span>579<span></label></li><li><span><input id="testing_4" type="checkbox" name="filter" value="4"></span><label for="testing_4"><div>Label 2</div><span>356<span></label></li><li><span><input id="testing_3" type="checkbox" name="filter" value="3"></span><label for="testing_3"><div>Label 3</div><span>109<span></label></li><li><span><input id="testing_2" type="checkbox" name="filter" value="2"></span><label for="testing_2"><div>Label 4</div><span>32<span></label></li><li><span><input id="testing_1" type="checkbox" name="filter" value="1"></span><label for="testing_1"><div>Label 5</div><span>13<span></label></li></ul>');
$obj = $scrape_obj->find("label[for^='testing_']");
for($i=0; $i<count($obj); $i++) {
  echo "n Number $in $obj[$i]nn";
}

这是输出:

 Number 0
 <label for="testing_5"><div>Label 1</div><span>579<span></label></li><li><span><input id="testing_4" type="checkbox" name="filter" value="4"></span><label for="testing_4"><div>Label 2</div><span>356<span></label></li><li><span><input id="testing_3" type="checkbox" name="filter" value="3"></span><label for="testing_3"><div>Label 3</div><span>109<span></label></li><li><span><input id="testing_2" type="checkbox" name="filter" value="2"></span><label for="testing_2"><div>Label 4</div><span>32<span></label></li><li><span><input id="testing_1" type="checkbox" name="filter" value="1"></span><label for="testing_1"><div>Label 5</div><span>13<span></label></li></ul>

 Number 1
 <label for="testing_4"><div>Label 2</div><span>356<span></label></li><li><span><input id="testing_3" type="checkbox" name="filter" value="3"></span><label for="testing_3"><div>Label 3</div><span>109<span></label></li><li><span><input id="testing_2" type="checkbox" name="filter" value="2"></span><label for="testing_2"><div>Label 4</div><span>32<span></label></li><li><span><input id="testing_1" type="checkbox" name="filter" value="1"></span><label for="testing_1"><div>Label 5</div><span>13<span></label></li></ul>

 Number 2
 <label for="testing_3"><div>Label 3</div><span>109<span></label></li><li><span><input id="testing_2" type="checkbox" name="filter" value="2"></span><label for="testing_2"><div>Label 4</div><span>32<span></label></li><li><span><input id="testing_1" type="checkbox" name="filter" value="1"></span><label for="testing_1"><div>Label 5</div><span>13<span></label></li></ul>

 Number 3
 <label for="testing_2"><div>Label 4</div><span>32<span></label></li><li><span><input id="testing_1" type="checkbox" name="filter" value="1"></span><label for="testing_1"><div>Label 5</div><span>13<span></label></li></ul>

 Number 4
 <label for="testing_1"><div>Label 5</div><span>13<span></label></li></ul>

正确的输出必须是:

 Number 0
 <label for="testing_5"><div>Label 1</div><span>579<span></label>
 Number 1
 <label for="testing_4"><div>Label 2</div><span>356<span></label>
 Number 2
 <label for="testing_3"><div>Label 3</div><span>109<span></label>
 Number 3
 <label for="testing_2"><div>Label 4</div><span>32<span></label>
 Number 4
 <label for="testing_1"><div>Label 5</div><span>13<span></label>

我该怎么修?

解决方案

问题是未关闭的跨度标记。你可以用一个简单的正则表达式来解决它:

$pattern = "/<span>([0-9]+)<span>/";
$replacement = "<span>$1</span>";
$html_code = preg_replace($pattern, $replacement, $html_code);

其中$html_code包含要解析的代码。

您可以使用substr()strpos()的组合来查找第一个标签块的结束位置。

在你的回声之前把这个放在你的循环中:

$obj[$i] = substr($obj[$i],0,strpos($obj[$i],'</label>')+8);

最新更新