是否可以使用Nokogiri进行多域搜索。我知道您可以对单个域/页面进行多个 Xpath/CSS 搜索,但可以对多个域进行搜索?
例如,我想刮 http://www.asus.com/Notebooks_Ultrabooks/S56CA/#specifications 并 http://www.asus.com/Notebooks_Ultrabooks/ASUS_TAICHI_21/#specifications
我的代码
require 'nokogiri'
require 'open-uri'
require 'spreadsheet'
doc = Nokogiri::HTML(open("http://www.asus.com/Notebooks_Ultrabooks/ASUS_TAICHI_21/#specifications"))
#Grab our product specifications
data = doc.css('div#specifications div#spec-area ul.product-spec li')
#Modify our data
lines = data.map(&:text)
#Create the Spreadsheet
Spreadsheet.client_encoding = 'UTF-8'
book = Spreadsheet::Workbook.new
sheet1 = book.create_worksheet
sheet1.name = 'My First Worksheet'
#Output our data to the Spreadsheet
lines.each.with_index do |line, i|
sheet1[i, 0] = line
end
book.write 'C:/Users/Barry/Desktop/output.xls'
> Nokogiri 没有 URL 的概念,它只知道 XML 或 HTML 的字符串或 IO 流。你混淆了OpenURI的目的和Nokogiri的目的。
如果要从多个站点读取,只需遍历 URL,然后将当前 URL 传递给 OpenURI 以open
页面:
%w[
http://www.asus.com/Notebooks_Ultrabooks/S56CA/#specifications
http://www.asus.com/Notebooks_Ultrabooks/ASUS_TAICHI_21/#specifications
].each do |url|
doc = Nokogiri::HTML(open(url))
# do somethng with the document...
end
OpenURI将读取页面,并将其内容传递给Nokogiri进行解析。Nokogiri一次仍然只能看到一个页面,因为这就是OpenURI传递的全部内容。