简单的PHP网络爬虫返回简单的HTML DOM错误

我有一个PHP脚本，可以返回网页上的链接。我收到 500 个内部错误，这就是我的服务器日志所说的。我让我的朋友在他的服务器上尝试相同的代码，它似乎运行正常。有人可以帮助我调试我的问题吗？警告指出有关包装器的某些内容已禁用。我检查了第 1081 行，但没有看到allow_url_fopen.

PHP 警告： file_get_contents（）：在服务器配置中，http://包装器在第 1081 行的/hermes/bosweb/web066/b669/ipg.streamversetv/simple_html_dom.php 中被 allow_url_fopen=0 禁用

PHP 警告：file_get_contents（http://www.dota2lounge.com/）：无法打开流：在第 1081 行的/hermes/bosweb/web066/b669/ipg.streamversetv/simple_html_dom.php 中找不到合适的包装器

PHP 致命错误：在/hermes/bosweb/web066/b669/ipg.streamversetv/sim 中的非对象上调用成员函数 find（）

<?php
 include_once('simple_html_dom.php');
 $target_url = 'http://www.dota2lounge.com/';
 $html = new simple_html_dom();
 $html->load_file($target_url);
  foreach($html->find(a) as $link){
    echo $link->href.'<br />';
  }
?>

下载最新simple_html_dom.php：下载链接

在您喜欢的编辑器中打开simple_html_dom.php并将此代码添加到前几行（可以在<?php之后立即添加）：

function file_get_contents_curl($url) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);     
    $data = curl_exec($ch);
    curl_close($ch);
    return $data; }

查找以function file_get_html($url.....开头的行对我来说是第 71 行，但您也可以在编辑器中使用搜索。（搜索file_get_html）
编辑此行（函数 file_get_html 之后的一些行）：

$contents = file_get_contents($url, $use_include_path, $context, $offset);

对此：

$contents = file_get_contents_curl($url);
与其load_file，不如使用它file_get_html它适合您，而无需编辑 php.ini

您需要将

allow_url_fopen php 设置设置为 1 以允许将fopen()与 url 一起使用。

参考：PHP：运行时配置

编辑：
还追踪到另一件事，你试过这样加载吗？

<?php
    include_once('simple_html_dom.php');
    $html = file_get_html('http://www.dota2lounge.com/');
    foreach($html->find('a') as $link)
    {
        echo $link->href.'<br />';
    }
?>

相关内容

最新更新

热门标签：