没有HTML标签的Nokogiri输出结构



我让Ruby前往一个网站,遍历一系列活动,并从页面中抓取特定数据。我现在的问题是从Nokogiri给我的结构中获取它,并将其输出为可读的形式。

campaign_list = Array.new
campaign_list.push(1042360, 1042386, 1042365, 992307)
browser = Watir::Browser.new :chrome
browser.goto '<redacted>'
browser.text_field(:id => 'email').set '<redacted>'
browser.text_field(:id => 'password').set '<redacted>'
browser.send_keys :enter
file = File.new('hourlysales.csv', 'w')
data = {}
campaign_list.each do |campaign|
  browser.goto "<redacted>"
  if browser.text.include? "Application Error"
    puts "Error loading page, I recommend restarting script"
    # Possibly automatic restart of script
  else
    hourly_data = Nokogiri::HTML.parse(browser.html).text   
    # file.write data
    puts hourly_data
  end

这是我得到的输出:

{"views":[[17,145],[18,165],[19,99],[20,71],[21,31],[22,26],[23,10],[0,15],[1,1],      [2,18],[3,19],[4,35],[5,47],[6,44],[7,67],[8,179],[9,141],[10,112],[11,95],[12,46],[13,82],[14,79],[15,70],[16,103]],"orders":[[17,10],[18,9],[19,5],[20,1],[21,1],[22,0],[23,0],[0,1],[1,0],[2,1],[3,0],[4,1],[5,2],[6,1],[7,5],[8,11],[9,6],[10,5],[11,3],[12,1],[13,2],[14,4],[15,6],[16,7]],"conversion_rates":[0.06870229007633588,0.05442176870748299,0.050505050505050504,0.014084507042253521,0.03225806451612903,0.0,0.0,0.06666666666666667,0.0,0.05555555555555555,0.0,0.02857142857142857,0.0425531914893617,0.022727272727272728,0.07462686567164178,0.06134969325153374,0.0425531914893617,0.044642857142857144,0.031578947368421054,0.021739130434782608,0.024390243902439025,0.05063291139240506,0.08571428571428572,0.06741573033707865]}

表示{ views [[hour, # of views], [hour, # of views], etc. }。订单也是一样。我不需要转化率。

我还需要将每个键的值相加,所以在对5个页面做了这样的操作之后,我有一天中每个小时的一个键,以及该小时的视图总数。我尝试了几个each循环,但不能取得任何进展。

感谢你们给我的帮助。

看起来输出(从你的代码,我假设是hourly_data的内容)是JSON。在这种情况下,很容易解析和加起来的数字。像这样:

require "json" # at the top of your script
# ...
def sum_hours_values(data, hours_values=nil)
  # Start with an empty hash that automatically initializes missing keys to `0`
  hours_values ||= Hash.new {|hsh,hour| hsh[hour] = 0 }
  # Iterate through the [hour, value] arrays, adding `value` to the running
  # count for that `hour`, and return `hours_values`
  data.each_with_object(hours_values) do |(hour, value), hsh|
    hsh[hour] += value
  end
end
# ... Watir/Nokogiri stuff here...
# Initialize these so they persist outside the loop
hours_views, orders_views = nil
campaign_list.each do |campaign|
  browser.goto "<redacted>"
  if browser.text.include? "Application Error"
    # ...
  else
    # ...
    hourly_data_parsed = JSON.parse(hourly_data)
    hours_views = sum_hours_values(hourly_data_parsed["views"], hours_views)
    hours_orders = sum_hours_values(hourly_data_parsed["orders"], orders_views)
  end
end
puts "Views by hour:"
puts hours_views.sort.map {|hour_views| "%2it%4i" % hour_views }
puts "Orders by hour:"
puts hours_orders.sort.map {|hour_orders| "%2it%4i" % hour_orders }

注:sum_hours_values有一个非常好的递归版本,我没有包括在内,因为迭代版本对大多数Ruby程序员来说更清晰。如果你对递归感兴趣,我把它留给你作为练习。div;)

相关内容

  • 没有找到相关文章

最新更新