JSOUP .attr() 方法不要从工作 html 中提取数据



我对 .attr 方法有一个问题,该方法不适用于除"class"以外的所有属性。 我试图提取"alt"属性来获取商店的名称,但它不起作用。尝试了相同的"src"和"数据原始",但没有打印出来。

这是我用来提取数据的整个方法。

public List<String> getShops() {
Elements elements = document.select(".store-logo");
System.out.println(elements.html());
for(Element image : elements){
System.out.println(image.attr("alt"));
}
return null;
}

为了确保我没有使用空文档,我已经为所有看起来像这样的元素打印了整个 HTML:

<img src="//image.ceneostatic.pl/imageschain/data/shops_s/20853/logo.jpg;data/custom_images/590/custom_image.png" alt="nalepsze.pl">
<img src="/content/img/icons/pix-empty.png" alt="allegro.pl" data-original="//image.ceneostatic.pl/imageschain/data/shops/20136/logo.jpg;data/custom_images/585/custom_image.png" class="js_lazy">
<img src="/content/img/icons/pix-empty.png" alt="avans.pl" data-original="//image.ceneostatic.pl/imageschain/data/shops/18601/logo.jpg;data/custom_images/585/custom_image.png" class="js_lazy">
<img src="/content/img/icons/pix-empty.png" alt="proshop.pl" data-original="//image.ceneostatic.pl/imageschain/data/shops/29068/logo.jpg;data/custom_images/585/custom_image.png" class="js_lazy">
<img src="/content/img/icons/pix-empty.png" alt="g2a.com" data-original="//image.ceneostatic.pl/imageschain/data/shops/23040/logo.jpg;data/custom_images/585/custom_image.png" class="js_lazy">
<img src="/content/img/icons/pix-empty.png" alt="fotosoft.pl" data-original="//image.ceneostatic.pl/imageschain/data/shops/3914/logo.jpg;data/custom_images/585/custom_image.png" class="js_lazy">
<img src="/content/img/icons/pix-empty.png" alt="techsat24.pl" data-original="//image.ceneostatic.pl/imageschain/data/shops/5666/logo.jpg;data/custom_images/585/custom_image.png" class="js_lazy">
<img src="/content/img/icons/pix-empty.png" alt="imperiumpc.pl" data-original="//image.ceneostatic.pl/imageschain/data/shops/12579/logo.jpg;data/custom_images/585/custom_image.png" class="js_lazy">
<img src="/content/img/icons/pix-empty.png" alt="domsary.eu" data-original="//image.ceneostatic.pl/imageschain/data/shops/4725/logo.jpg;data/custom_images/585/custom_image.png" class="js_lazy">
<img src="/content/img/icons/pix-empty.png" alt="net-s.pl" data-original="//image.ceneostatic.pl/imageschain/data/shops/3653/logo.jpg;data/custom_images/585/custom_image.png" class="js_lazy">
<img src="/content/img/icons/pix-empty.png" alt="sferis.pl" data-original="//image.ceneostatic.pl/imageschain/data/shops/4614/logo.jpg;data/custom_images/585/custom_image.png" class="js_lazy">
<img src="/content/img/icons/pix-empty.png" alt="morele.net" data-original="//image.ceneostatic.pl/imageschain/data/shops/379/logo.jpg;data/custom_images/585/custom_image.png" class="js_lazy">
<img src="/content/img/icons/pix-empty.png" alt="zakupy.vip" data-original="//image.ceneostatic.pl/imageschain/data/shops/29402/logo.jpg;data/custom_images/585/custom_image.png" class="js_lazy">
<img src="/content/img/icons/pix-empty.png" alt="fotoelektro.pl" data-original="//image.ceneostatic.pl/imageschain/data/shops/1671/logo.jpg;data/custom_images/585/custom_image.png" class="js_lazy">
<img src="/content/img/icons/pix-empty.png" alt="3kropki.pl" data-original="//image.ceneostatic.pl/imageschain/data/shops/357/logo.jpg;data/custom_images/585/custom_image.png" class="js_lazy">
<img src="/content/img/icons/pix-empty.png" alt="electro.pl" data-original="//image.ceneostatic.pl/imageschain/data/shops/16202/logo.jpg;data/custom_images/585/custom_image.png" class="js_lazy">
<img src="/content/img/icons/pix-empty.png" alt="allegro.pl" data-original="//image.ceneostatic.pl/imageschain/data/shops/20136/logo.jpg;data/custom_images/585/custom_image.png" class="js_lazy">
<img src="/content/img/icons/pix-empty.png" alt="avans.pl" data-original="//image.ceneostatic.pl/imageschain/data/shops/18601/logo.jpg;data/custom_images/585/custom_image.png" class="js_lazy">
<img src="/content/img/icons/pix-empty.png" alt="proshop.pl" data-original="//image.ceneostatic.pl/imageschain/data/shops/29068/logo.jpg;data/custom_images/585/custom_image.png" class="js_lazy">

此提取中的数据是正确的,但每个循环的下一步不起作用,我为每个元素得到一个空字符串,这很奇怪,因为我可以提取一个"类"属性。

我非常感谢根据该主题的任何提示。

jsoup ver 是 1.11.3

在您的示例中,您使用类"store-logo",但在附加的 html 文档中,没有一个 img 元素具有此类。将类名替换为"js_lazy"时,代码将提取 alt 属性。

最新更新