如何使用nokogiri抓取页面 http://www.tradus.com/t-shirts-tees-reebok-puma-fifa-teesort/t/7682?Type=polo+neck 产品的名称和价格以及如何抓取该类别的所有产品作为分页。以下是我获得价格的代码,但在 HTML 标签中,仅适用于 1 页。
require 'nokogiri'
require 'open-uri'
url = "http://www.tradus.com/t-shirts-tees-reebok-puma-fifa-teesort/t/7682? Type=polo+neck"
doc = Nokogiri::HTML(open(url))
doc.css(".prodListing-item").each do |dv|
product_name = dv.at_css('.prod-name').text unless dv.at_css(".prod-name").nil?
product_price = dv.at_css('.price-info span span:nth-child(2)').to_s
puts product_name + product_price
end
Following is the code which resolved the issue
require 'nokogiri'
require 'open-uri'
number=1
while true
url="http://www.tradus.com/t-shirts-tees-reebok-puma-fifa-teesort/t/7682? Type=polo+neck&page=#{number}"
doc = Nokogiri::HTML(open(url))
products=doc.css(".prodListing-item")
break if products.size == 0
products.each do |item|
product_name = item.at_css('.prod-name').text unless item.at_css(".prod-name").nil?
product_price = item.at_css('.price-info span span:nth-child(2)').text unless item.at_css(".price-info span span:nth-child(2)").nil?
puts product_name +"<==========>" +product_price
end
puts "page" +"#{number}"
number += 1
end
puts "exit of the while loop"