ruby on rails - nokogiri中的Html数据解析问题



我有一个纯html文件

我正在使用ruby 1.8.7我需要记下订单号&追踪号。在其中一些跟踪没有丢失,我需要把'nil'在这种情况下。

但仍然不能得到正确的解决方案。

<html>
  <head>
  </head>
  <body>
    <div>***NOTE*** <br> ITems<br><br>
    Invoice Number : [982157] PO Number : [7894562] <br>Shipped To:<br>HOHNE<br> TROXLER RD<br><br>India<br>
    Invoice Number : [982157] PO Number : [7894562] <br>Shipped To:<br>HOHNE<br><br><br>
    <br>
    Invoice Number : [982157] PO Number : [7894562] <br>Shipped To:<br>HOHNE<br>TROXLER RD<br><br>India<br><br>Shipped Via : UPS    Track It : <a href= ab.com> 1Z2559690357791340</a><br><font face="COURIER" size="2" color="black"><br>
    <br>
    Invoice Number : [982157] PO Number : [7894562] <br>Shipped To:<br>HOHNE<br>TROXLER RD<br><br>India<br>
    <br>
    Invoice Number : [982157] PO Number : [7894562] <br>Shipped To:<br>HOHNE<br> TROXLER RD<br><br>India<br><br>Shipped Via : UPS    Track It : <a href= ab.com> 1Z2559690357791340</a><br><font face="COURIER" size="2" color="black"><br>
  </body>
</html>

我的代码像

require 'rubygems'
require 'nokogiri'   
require 'open-uri'
PAGE_URL = "a.html"
page = Nokogiri::HTML(open(PAGE_URL))
    data = page.css("body").text
    po_numbers = data.scan(/Invoice Number : [d+] PO Number : [(d+)]/).flatten
    tracking_numbers = page.css("a").text.split
    [["PO Number", "Tracking Number"]].concat(po_numbers.zip(tracking_numbers))
 puts po_numbers
 puts tracking_numbers

=> po_numbers = ["7894562", "7894562", "7894562","7894562","7894562"]
=> tracking_numbers = ["1Z2559690357791340", "1Z2559690357791340"]
=> po_numbers.zip(tracking_numbers)
=> [["7894562", "1Z2559691257791340"], ["7894562", "1Z2559690357791340"], ["7894562", "1Z2559690357791340"],["7894562","nil"],["7894562,nil "]]
What i want is 
=> [["7894562", "1Z2559691257791340"], ["7894562", "nil"], ["7894562", "1Z2559690357791340"],["7894562","nil"],["7894562,1Z2559690357791340 "]]

我建议在保存po_numberstracking_numbers时使用Hash因此,您可以将po_numberstracking_numbers关联

相关内容

  • 没有找到相关文章

最新更新