我如何解决HTTP500在ruby中使用机械化进行web抓取时的错误



我想从这个网站检索我的驾驶执照号码、issue_date和expiry_date("https://sarathi.nic.in:8443/nrportal/sarathi/HomePage.jsp")。当我试图获取它时,我得到错误Mechanize::ResponseCodeError: 500 => Net::HTTPInternalServerError for https://sarathi.nic.in:8443/nrportal/sarathi/DlDetRequest.jsp -- unhandled response.

这是我写的代码:

require 'mechanize'
require 'logger'
require 'nokogiri'
require 'open-uri'
require 'openssl'
OpenSSL::SSL::VERIFY_PEER = OpenSSL::SSL::VERIFY_NONE
agent = Mechanize.new
agent.log = Logger.new "mech.log"
agent.user_agent_alias = 'Mac Safari 4'
Mechanize.new.get("https://sarathi.nic.in:8443/nrportal/sarathi/HomePage.jsp")  
page=agent.get('https://sarathi.nic.in:8443/nrportal/sarathi/HomePage.jsp')  # opening home page.
page = agent.page.links.find { |l| l.text == 'Status of Licence' }.click         # click the link.
page.forms_with(:name=>"dlform").first.field_with(:name=>"dlform:DLNumber").value="TN3‌​8 20120001119" #user input to text field.
page.form_with(:name=>"dlform").field_with(:name=>"javax.faces.ViewState").value="SUBMIT"  #submit button value assigning.
page.form(:name=>"dlform",:action=>"/nrportal/sarathi/DlDetRequest.jsp") #to specify the form i need.
agent.cookie_jar.clear!
gg=agent.submit page.forms.last  #submitting my form

它不起作用,因为您在提交表单之前清除了cookie,因此删除了您提供的所有输入数据。我可以通过简单地将其删除来使其工作:

...
page.forms_with(:name=>"dlform").first.field_with(:name=>"dlform:DLNumber").value="TN3‌​8 20120001119" #user input to text field
form = page.form(:name=>"dlform",:action=>"/nrportal/sarathi/DlDetRequest.jsp")
gg = agent.submit form, form.buttons.first

请注意,您不需要设置#submit按钮的值,而是在表单提交时传递submit按钮。

最新更新