我有一个纯html文件
我正在使用ruby 1.8.7我需要记下订单号&追踪号。在其中一些跟踪没有丢失,我需要把'nil'在这种情况下。
但仍然不能得到正确的解决方案。
<html>
<head>
</head>
<body>
<div>***NOTE*** <br> ITems<br><br>
Invoice Number : [982157] PO Number : [7894562] <br>Shipped To:<br>HOHNE<br> TROXLER RD<br><br>India<br>
Invoice Number : [982157] PO Number : [7894562] <br>Shipped To:<br>HOHNE<br><br><br>
<br>
Invoice Number : [982157] PO Number : [7894562] <br>Shipped To:<br>HOHNE<br>TROXLER RD<br><br>India<br><br>Shipped Via : UPS Track It : <a href= ab.com> 1Z2559690357791340</a><br><font face="COURIER" size="2" color="black"><br>
<br>
Invoice Number : [982157] PO Number : [7894562] <br>Shipped To:<br>HOHNE<br>TROXLER RD<br><br>India<br>
<br>
Invoice Number : [982157] PO Number : [7894562] <br>Shipped To:<br>HOHNE<br> TROXLER RD<br><br>India<br><br>Shipped Via : UPS Track It : <a href= ab.com> 1Z2559690357791340</a><br><font face="COURIER" size="2" color="black"><br>
</body>
</html>
我的代码像
require 'rubygems'
require 'nokogiri'
require 'open-uri'
PAGE_URL = "a.html"
page = Nokogiri::HTML(open(PAGE_URL))
data = page.css("body").text
po_numbers = data.scan(/Invoice Number : [d+] PO Number : [(d+)]/).flatten
tracking_numbers = page.css("a").text.split
[["PO Number", "Tracking Number"]].concat(po_numbers.zip(tracking_numbers))
puts po_numbers
puts tracking_numbers
=> po_numbers = ["7894562", "7894562", "7894562","7894562","7894562"]
=> tracking_numbers = ["1Z2559690357791340", "1Z2559690357791340"]
=> po_numbers.zip(tracking_numbers)
=> [["7894562", "1Z2559691257791340"], ["7894562", "1Z2559690357791340"], ["7894562", "1Z2559690357791340"],["7894562","nil"],["7894562,nil "]]
What i want is
=> [["7894562", "1Z2559691257791340"], ["7894562", "nil"], ["7894562", "1Z2559690357791340"],["7894562","nil"],["7894562,1Z2559690357791340 "]]
我建议在保存po_numbers
和tracking_numbers
时使用Hash
因此,您可以将po_numbers
与tracking_numbers
关联