无法识别要与机械化一起抓取的正确 CSS 选择器

我已经构建了一个网络刮板，该网络刮板成功地将我所需的所有内容都从我要查看的网页中删除。目的是为与特定URL处的所有咖啡相关的特定图像提取URL。

我定义为完成刮擦的耙子任务如下：

mechanize = Mechanize.new
mechanize.get(url) do |page|
    page.links_with(:href => /products/).each do |link|
        coffee_page = link.click
            bean = Bean.new
            bean.acidity = coffee_page.css('[data-id="acidity"]').text.strip.gsub("acidity ","")
            bean.elevation = coffee_page.css('[data-id="elevation"]').text.strip.gsub("elevation ","")
            bean.roaster_id = "2"
            bean.harvest_season = coffee_page.css('[data-id="harvest"]').text.strip.gsub("harvest ","")
            bean.price = coffee_page.css('.price-wrap').text.gsub("$","")
            bean.roast_profile = coffee_page.css('[data-id="roast"]').text.strip.gsub("roast ","")
            bean.processing_type = coffee_page.css('[data-id="process"]').text.strip.gsub("process ","")
            bean.cultivar = coffee_page.css('[data-id="cultivar"]').text.strip.gsub("cultivar ","")
            bean.flavor_profiles = coffee_page.css('.price-wrap+ p').text.strip
            bean.country_of_origin = coffee_page.css('#pdp-order h1').text.strip
            bean.image_url = coffee_page.css('img data-featured-product-image').attr('src')
            if bean.country_of_origin == "Origin Set" || bean.country_of_origin == "Gift Card (online use only)"
                bean.destroy
            else
                ap bean
            end
    end
end

现在我需要的信息全部都在页面上，我正在寻找如下图所示的图像URL，但对于源页面上的所有单独的咖啡_pages。它需要足够通用才能提取此图片源，但别无其他。我尝试了许多不同的CSS选择器，但是所有内容都会吸引零或空白。

<img src="//cdn.shopify.com/s/files/1/2220/0129/products/ceremony-product-gummy-bears_480x480.jpg?v=1551455589" alt="Burundi Kiryama" data-product-featured-image style="display:none">

我在这里的咖啡_page：https：//shop.ceryreycoffee.com/products/burundi-kiryama

您需要更改

bean.image_url = coffee_page.css('img data-featured-product-image').attr('src')

bean.image_url = coffee_page.css('#mobile-only>img').attr('src')

如果可以的话，请始终使用附近的标识符找到要访问的元素。

相关内容

最新更新

热门标签：