我正试图让Xpath查询输出到新的单元格行,但没有成功。我正试图将我的输出逐行放入列A,而不是第1行列A、B、C
我的完整代码位于https://gist.github.com/3205801
最好使用Axslx还是CSV作为标准?
#Set encoding options to remove nasty Trademark symbols
encoding_options = {
:invalid => :replace, # Replace invalid byte sequences
:undef => :replace, # Replace anything not defined in ASCII
:replace => '', # Use a blank for those replacements
:universal_newline => true # Always break lines with n
}
doc = Nokogiri::HTML(open("http://h10010.www1.hp.com/wwpc/ie/en/ho/WF06b/321957-321957-3329742-89318-89318-5186820-5231694.html?dnr=1"))
#For each break create a ;
doc.css('br').each{ |br| br.replace ';' }
clues = Array.new
clues << 'Operating system'
clues << 'Processors'
CSV.open("output.csv", "wb") do |csv|
#1. Output the Clues header
#2. Scrape the output/force encoding to remove special characters
csv << clues
csv << clues.map{|clue| doc.at("//td[text()='#{clue}']/following-sibling::td").text.strip.encode Encoding.find('ASCII'), encoding_options}
#end loop
end
我不确定我是否理解这个问题,但我认为你想要这样的数据:
header1,value1
header2,value2
header3,value3
而不是:
header1,header2,header3
value1,value2,value3
如果这是真的,你可以做:
CSV.open("output.csv", "wb") do |csv|
clues.each do |one_clue|
csv << one_clue
xpath = "//td[text()='#{one_clue}']/following-sibling::td"
csv << doc.at(xpath).text.strip.encode Encoding.find('ASCII'), encoding_options
end
end