php simple_html_dom load_file/file_get_contents超时不起作用

我使用simple_html_dom来解析html，下面是我的核心代码

set_time_limit(10000);
foreach ($urlList as $url) {
    ini_set('default_socket_timeout', 5);
    $context = stream_context_create(
        array(
            'http'=>array(
                'method' => 'GET', 
                'timeout' => 5
            ),
        )
    );
    $shd->load_file($url, false, $context);
    var_dump(0);
    $html = $shd->find("table");
    ...
}

但它不适用于load_file()超时，只在超过10000秒的set_time_limit(10000)时停止脚本；

我希望load_file在当前任务超过5秒时跳到下一个任务，有办法吗？

最后，我使用curl获取内容，然后使用simple_html_dom处理内容。

function get_html_by_curl($url, $timeout = 5) {
     $ch = curl_init();
     curl_setopt($ch, CURLOPT_URL, $url);
     curl_setopt($ch, CURLOPT_HEADER, false);
     curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
     curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
     curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
     curl_setopt($ch, CURLOPT_TIMEOUT, $timeout);
     $html = curl_exec($ch);     
     if (false === $html) {
         return false;
     }
     if (200 != curl_getinfo($ch, CURLINFO_HTTP_CODE)) {
         return false;
     }
     return $html;
 }
 $content = get_html_by_curl('http://www.google.com', 5); $i = 0;
 while($i<3&&!$content) {
     $content = get_html_by_curl('http://www.google.com', 5);
      $i++; }
 if (false !== $html) {
     $shd->load($content ); 
}

相关内容

最新更新

热门标签：