Jsoup html.select() 无法捕获<h3>标头

我试图捕获h1, h2和h3标签为以下HTML页面，但h3只返回第一个URL，不返回第二个URL。

URL(返回H3) = https://docs.paloaltonetworks.com/prisma/prisma-access/prisma-access-panorama-release-notes/prisma-access-about/features-in-prisma-access

URL(不返回H3) = https://docs.paloaltonetworks.com/pan-os/10-2/pan-os-admin/authentication/configure-multi-factor-authentication/configure-mfa-between-rsa-securid-and-firewall

String url = "https://docs.paloaltonetworks.com/pan-os/10-2/pan-os-admin/authentication/configure-multi-factor-authentication/configure-mfa-between-rsa-securid-and-firewall";
try {
Document html = Jsoup.connect(url).userAgent("Mozilla").get();
Elements hTags = html.select("h1,h2,h3");
System.out.println(hTags);

} catch (IOException e) {
System.out.println("In exception " + e);
throw new RuntimeException(e);
}

如果我查看两个HTML文件的页面源，H3头不显示，但是，当我检查页面时，两个HTML页面都显示H3头。如有任何帮助，不胜感激。

当我下载纯HTML时(例如"View page source")，我也找不到任何H3标头。

但是当我使用开发人员工具时(在Firefox中打开F12)，我可以找到H3标头。

这么说:H3标头是在页面加载后动态加载的。JSoup不会自动评估那些将加载更多内容的脚本。因此，您不会得到这些值。

所以总结并引用链接的问题:JSoup是一个HTML解析器因此，在通过任何脚本加载HTML后，不知道加载到HTML中的任何内容。还提到了这个讨论:有没有一种方法在Java中嵌入浏览器?

相关内容

最新更新

热门标签：