使用 HTML DOM 进行数据提取



我在数据提取方面遇到了问题,我也看到了很多关于这个问题的主题,但我无法找到任何满足我要求的解决方案,所以我请求你帮助我解决这个错误。

<?php 
    require('admin/inc/simple_html_dom.php');
    $html = file_get_contents("http://health.hamariweb.com/rawalpindi/doctors");
    $title = $html->find("div#infinite-grid-images", 0)->innertext;
    echo $title;
?>

我想向我的网站展示所有这些医生,我只是在学习数据提取,我看了很多教程,但仍然没有结果,请任何可以帮助我的人:(

尝试加载 file_get_content(( 返回的字符串。

<?php 
    require('admin/inc/simple_html_dom.php');
    $html = file_get_contents("http://health.hamariweb.com/rawalpindi/doctors");
    $dom = new simple_html_dom();
    $dom->load($html);
    $title = $dom->find("#infinite-grid-images", 0)->innertext;
    echo $title;
?>

此外,simple_html_dom.php文件中附带的函数称为: file_get_html($url)

您可以执行以下操作:

<?php 
    require('admin/inc/simple_html_dom.php');
    $html = file_get_html("http://health.hamariweb.com/rawalpindi/doctors");
    if($html){
        $title = $dom->find("#infinite-grid-images", 0)->innertext;
        echo $title;
    }else{
        echo "Nothing found";
    }
?>

祝你好运!

卷曲也是你的朋友。

<?php
    require('simple_html_dom.php');
    $curl = curl_init();
    curl_setopt_array($curl, array(
        CURLOPT_URL => "http://health.hamariweb.com/rawalpindi/doctors",
        CURLOPT_RETURNTRANSFER => 1,
        CURLOPT_FOLLOWLOCATION => 1,
        CURLOPT_ENCODING => "",
        CURLOPT_MAXREDIRS => 10,
        CURLOPT_TIMEOUT => 30,
        CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
        CURLOPT_USERAGENT => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
    ));
    $file = curl_exec($curl);
    $error = curl_error($curl);
    curl_close($curl);
    $dom = new simple_html_dom();
    $dom->load($file);
    $doctorDivs = $dom->find("#infinite-grid-images", 0)->children();
    $doctors = array();
    foreach($doctorDivs as $div){
        $doctor = array();
        $doctor["image"] = "http://health.hamariweb.com/".$div->find('img', 0)->src;
        $details = $div->find('table', 1)->find("tr");
        $doctor["name"] = trim($details[0]->plaintext);
        $doctor["type"] = trim($details[1]->plaintext);
        $doctor["etc"] = trim($details[2]->plaintext);
        $doctors[] = $doctor;
    }
echo "<pre>";
var_dump($doctors);
?>

您可以决定如何处理数据。

如果没有使用用户代理,您尝试报废的网站会返回 http 500 错误,要绕过它,您可以使用 curl,即:

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://health.hamariweb.com/rawalpindi/doctors");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:53.0) Gecko/20100101 Firefox/53.0");
$html = curl_exec($ch);
curl_close($ch);
# your code ...

最新更新