从外部站点检索 div 的内容

尝试使用 PHP 和 XPath 从外部站点检索div 的内容

这是页面的摘录，显示了相关代码：注意：我尝试添加所有 - 也在类上添加@，在我的查询末尾添加a，之后，我使用saveHTML()来获取它。看我的测试：

顺便说一句：

this is my XPath:  //*[@id="post-15991"]/div[4]/div[1]
this is the URL: https://wordpress.org/plugins/wp-job-manager/

请参阅后续代码：

<?PHP
$url = 'https://wordpress.org/plugins/wp-job-manager/';
$dom = new DOMDocument();
@$dom->loadHTMLFile($url);
$xpath = new DOMXpath($dom);
$elements = $xpath->query('//*[@id="post-15991"]/div[4]/div[1]');
$link = $dom->saveHTML($elements->item(0));
echo $link;
?>

输出：但输出为零。

背景：

我获取 XPath 的方式; 使用谷歌浏览器：我有一个网页，我想获取一些数据：

https://wordpress.org/plugins/wp-job-manager/
https://wordpress.org/plugins/participants-database/
https://wordpress.org/plugins/amazon-link/
https://wordpress.org/plugins/simple-membership/
https://wordpress.org/plugins/scrapeazon/

目标：我需要以下数据：

Version:
Last updated:
Active installations:
Tested up

例如，请参阅以下内容 - view-source：https：//wordpress.org/plugins/wp-job-manager/

版本：1.29.3

最后更新：5天前

有效安装：100,000+

<li>
Requires WordPress Version:<strong>4.3.1</strong>                </li>
<li>Tested up to: <strong>4.9.2</strong></li>

背景：我需要来自所有我最喜欢的插件的数据 - 想要将其保存在数据库或计算表中。所以大约有 70 页需要刮：_

请参阅此处的示例列表 - 完整的 XPaAth：

//*[@id="post-15991"]/div[4]/div[1]

和工作委员会经理：

//*[@id="post-519"]/div[4]/div[1]/ul/li[1]
//*[@id="post-519"]/div[4]/div[1]/ul/li[2]
//*[@id="post-519"]/div[4]/div[1]/ul/li[3]
//*[@id="post-519"]/div[4]/div[1]/ul/li[7]

我使用了这种方法：有没有办法在谷歌浏览器中获取 xpath？

Right click "inspect" on the item you are trying to find the xpath
Right click on the highlighted area on the console.
Go to Copy xpath

您正在调用需要文件路径的.loadHTMLFile。如果您打开了警告选项，您将看到以下警告：

E_WARNING ：类型 2 -- DOMDocument：：loadHTMLFile()：在 https://wordpress.org/plugins/wp-job-manager/中重新定义的属性类，第 190 行 -- 在第 5 行

E_WARNING ：类型 2 -- DOMDocument：：loadHTMLFile()：标记标头在 https://wordpress.org/plugins/wp-job-manager/中无效，行： 201 -- 在第 5 行

E_WARNING ：类型 2 -- DOMDocument：：loadHTMLFile()：标记导航在 https://wordpress.org/plugins/wp-job-manager/中无效，行： 205 -- 在第 5 行

E_WARNING ：类型 2 -- DOMDocument：：loadHTMLFile()：标记 main 在 https://wordpress.org/plugins/wp-job-manager/中无效，行： 224 -- 在第 5 行

请改用.loadHTML。

$url = 'https://wordpress.org/plugins/wp-job-manager/';
$dom = new DOMDocument();
@$dom->loadHTML($url);
$xpath = new DOMXpath($dom);
$elements = $xpath->query('//*[@id="post-15991"]/div[4]/div[1]');
$link = $dom->saveHTML($elements->item(0));
echo $link;

结果将是：

https://wordpress.org/plugins/wp-job-manager/

相关内容

最新更新

热门标签：