使用 XPath 和 PHP 抓取 HTML 页面



我正在尝试使用此PHP代码抓取HTML页面

<?php
ini_set('display_errors', 1);
$url = 'http://www.cittadellasalute.to.it/index.php?option=com_content&view=article&id=6786:situazione-pazienti-in-pronto-soccorso&catid=165:pronto-soccorso&Itemid=372';

//#Set CURL parameters: pay attention to the PROXY config !!!!
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_PROXY, '');
$data = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument();
@$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$greenWaitingNumber = $xpath->query('/html/body/div/div/div[4]/div[3]/section/p');

foreach( $greenWaitingNumber as $node )
{
echo "Number first green line: " .$node->nodeValue;
echo '<br>';
echo '<br>';
}

?>

一切正常(没有错误,在我的浏览器控制台中,我可以看到"200"作为返回代码......(,但我的 HTML 页面中没有打印任何内容......

问题可能出在xpath/html/body/div/div/div/div[4]/div[3]/section/p上,它指的是源HTML页面中的第一条绿线,但这是我的Firefox Firebug告诉我的那个页面部分....

建议/示例?

!!更新!!!!

正如桑托什·萨普科塔(Santosh Sapkota(在他的回复中建议的那样,第一个问题是该绿色框中的文本是从iFrame加载的......我已经在 IFrame 广告中看到了 HTML 页面的网址,所以我尝试在我的代码中使用这个,现在是......

<?php
ini_set('display_errors', 1);
$url = 'http://listeps.cittadellasalute.to.it/?id=01090101';

//#Set CURL parameters: pay attention to the PROXY config !!!!
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_PROXY, '');
$data = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument();
@$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$greenWaitingNumber = $xpath->query('/html/body/div/div/div[4]/div[3]/section/p');

foreach( $greenWaitingNumber as $node )
{
echo "Number first green line: " .$node->nodeValue;
echo '<br>';
echo '<br>';
}

?>

但不幸的是,我的输出 HTML 页面中仍然没有打印任何内容....

其他建议/示例?

一定是你的xpath有问题。以及检查是否有来自iFrame的内容。

最新更新