用Nokogiri刮擦的网页返回没有数据



我试图从英国英国石油门户网站上刮擦项目清单,但我的代码没有返回数据。相反,我想制作一系列项目标题。

class Entry
  def initialize(title)
    @title = title
  end
  attr_reader :title
end
def index
  @projects=Project.all
  require 'open-uri'
  require 'nokogiri'
  doc = Nokogiri::HTML(open("https://itportal.decc.gov.uk/pathfinder/currentprojectsindex.html"))
  entries = doc.css('.operator-container')
  @entries = []
  entries.each do |row|
    title = row.css('.setoutForm').text
    @entries << Entry.new(title)
  end
end

您发布的链接不包含数据。您看到的页面是一个框架集,每个帧由其自己的URL创建。您想解析左框架,因此您应该编辑代码以打开左帧的URL:

  doc = Nokogiri::HTML(open('https://itportal.decc.gov.uk/eng/fox/path/PATH_REPORTS/current-projects-index'))

各个项目在单独的页面上,您需要打开每个项目。例如,第一个是:

project_file = open(entries.first.css('a').attribute('href').value)       
project_doc = Nokogiri::HTML(project_file)

" setOutform"类刮擦了很多文本。例如:

> project_doc.css('.setoutForm').text
=> "n            n              Field Typen              Locationn              Water De
pth (m)n              First Productionn              Contactn            n            n
              Oiln              2/15n              155mn              Q3/2018n          
    n                John Gilln                Business Development Managern             
   jgill@alphapetroleum.comn                01483 307204n              n            n   
       n            n              Project Summaryn            n            n          
    n                The Cheviot discovery is located in blocks 2/10a, 2/15a and 3/11b. n 
               n                Reserves are approximately 46mmbbls oil.n                
n                A Field Development Plan has been submitted and technically approved. The c
oncept is for a leased FPSA with 18+ subsea wells. Oil export will be via tanker offloading.
n                n              n            n          "   

但是标题不在该文本中。如果您想要标题,请刮擦页面的这一部分:

<div class="field-header" foxid="eu1KcH_d4qniAjiN">Cheviot</div>

您可以使用此CSS选择器:

> project_doc.css('.operator-container .field-header').text
=> "Cheviot"

逐步编写此代码。除非单步,否则很难找出您的代码出错的地方。例如,我使用Nokogiri的命令行工具打开了

的交互式红宝石壳
nokogiri https://itportal.decc.gov.uk/eng/fox/path/PATH_REPORTS/current-projects-index

最新更新