使用 Rails Nokogiri XPath 解析 XML 工作表以获取属性和元素



我正在尝试在rails 4.2.0环境中使用Nokogiri来解析类的数据表。 我打算对每门课程进行分析,存储@catalog_nbr、@subject属性,并列出第一位讲师。 我在下面的代码只是产生空数组。 我相信问题与使用 .each 方法有关,但我无法弄清楚!

require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML( open("https://courseroster.reg.cornell.edu/courses/roster/SP15/CS/xml/") )
doc.xpath("//course").each do
  num = doc.xpath("./@catalog_nbr").text
  subject = doc.xpath("./@subject").text
  instructor = doc.xpath("./sections/section/meeting/instructors/instructor")[1].text
  Course.create(:subject => subject, :number => num, :instructor => instructor)
end

试试这个。选择文档后,我们需要遍历文档中的每一行。让我们将每一行称为row下一个。如果默认值为空,则分配默认值。阅读本文以获取有关此内容的更多信息。

doc.xpath("//course").each do |row|
  num = row.xpath("./@catalog_nbr").text  || "N/A"
  subject = row.xpath("./@subject").text || "N/A"
  instructor = row.xpath("./sections/section/meeting/instructors/instructor")[1].text  || "N/A"
  Course.create(:subject => subject, :number => num, :instructor => instructor)
end

这是一个可行的解决方案。请注意,您链接到的XML文件始终具有每门课程的目录号和主题,因此不需要任何|| "N/A"(但也许安全起见是件好事):

require 'nokogiri'
require 'open-uri'
doc = Nokogiri.XML( open("https://courseroster.reg.cornell.edu/courses/roster/SP15/CS/xml/") )
doc.xpath("/courses/course").each do |course|
  num  = course["catalog_nbr"] || "N/A"  # in case it doesn't exist
  subj = course["subject"]     || "N/A"  # in case it doesn't exist
  inst = (course.at("sections/section/meeting/instructors/instructor/text()") || "N/A").to_s
  data = { subject:subj, number:num, instructor:inst }
  p data
end
#=> {:subject=>"CS", :number=>"1110", :instructor=>"Van Loan,C (cfv3)"}
#=> {:subject=>"CS", :number=>"1112", :instructor=>"Fan,K (kdf4)"}
#=> {:subject=>"CS", :number=>"1130", :instructor=>"Frey,C (ccf27)"}
#=> {:subject=>"CS", :number=>"1130", :instructor=>"Frey,C (ccf27)"}
#=> {:subject=>"CS", :number=>"1132", :instructor=>"Fan,K (kdf4)"}
#=> etc.

最新更新