机械化中的NoMethodError



使用mecahnize 2.7.3和ruby 2.3.0dev:运行此代码

require 'mechanize'
agent = Mechanize.new
agent.keep_alive = false
agent.open_timeout = 2
agent.read_timeout = 2
agent.ignore_bad_chunking = true
agent.gzip_enabled = false
url = 'http:%5C%5Cwww.scouts.org.uk'
agent.head(url)

给我这个NoMethodError:

~/.rvm/gems/ruby-head/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:648:in resolve': undefined     
methodlength' for nil:NilClass (NoMethodError)
from ~/.rvm/gems/ruby-head/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:223:in `fetch'
from ~/.rvm/gems/ruby-head/gems/mechanize-2.7.3/lib/mechanize.rb:459:in `head

这是mechanize中的错误还是我做错了什么?如果是这样,如何修复?

编辑:这个网址显然很糟糕,但我从一个文件中读了很多网址,其中一些可能是错误的。

编辑2:假设我有一个这样的文件http://pastie.org/9934756我需要获得所有正确URL的头部,并忽略其他

如果写错了url,请尝试以下操作:url = 'http://scouts.org.uk'

您的目标站点正在进行重定向并使用元刷新。更新您的代码以包含这些方法:

require 'mechanize'
agent = Mechanize.new
agent.keep_alive = false
agent.follow_meta_refresh = true
agent.redirect_ok = true
agent.open_timeout = 10
agent.read_timeout = 10
agent.ignore_bad_chunking = true
agent.gzip_enabled = false
url = 'http:%5C%5Cwww.scouts.org.uk'
begin
  page_head = agent.head(url)
rescue Exception => exception
  puts "Caught exception: #{exception.message}"
end

结果:

=> #Caught exception: undefined method `length' for nil:NilClass

您可以添加此方法来检查是否有效的url:

require 'uri'
def valid?(url) 
    uri = URI.parse(url) 
    if uri.kind_of?(URI::HTTP) == true
        puts '+'
    else 
        puts '-'
    end
rescue URI::InvalidURIError 
    puts 'false '
end
['http://web.de',
'http://web.de/',
'http:%5c%5cweb.de',
'http:web.de',
'foo://web.de',
'http://we b.de',
'http://|web.de'].each { |i|
    valid?(i)
}

+

+

+

+

错误

错误

相关内容

  • 没有找到相关文章

最新更新