使用mecahnize 2.7.3和ruby 2.3.0dev:运行此代码
require 'mechanize'
agent = Mechanize.new
agent.keep_alive = false
agent.open_timeout = 2
agent.read_timeout = 2
agent.ignore_bad_chunking = true
agent.gzip_enabled = false
url = 'http:%5C%5Cwww.scouts.org.uk'
agent.head(url)
给我这个NoMethodError:
~/.rvm/gems/ruby-head/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:648:in resolve': undefined
methodlength' for nil:NilClass (NoMethodError)
from ~/.rvm/gems/ruby-head/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:223:in `fetch'
from ~/.rvm/gems/ruby-head/gems/mechanize-2.7.3/lib/mechanize.rb:459:in `head
这是mechanize中的错误还是我做错了什么?如果是这样,如何修复?
编辑:这个网址显然很糟糕,但我从一个文件中读了很多网址,其中一些可能是错误的。
编辑2:假设我有一个这样的文件http://pastie.org/9934756我需要获得所有正确URL的头部,并忽略其他
如果写错了url,请尝试以下操作:url = 'http://scouts.org.uk'
您的目标站点正在进行重定向并使用元刷新。更新您的代码以包含这些方法:
require 'mechanize'
agent = Mechanize.new
agent.keep_alive = false
agent.follow_meta_refresh = true
agent.redirect_ok = true
agent.open_timeout = 10
agent.read_timeout = 10
agent.ignore_bad_chunking = true
agent.gzip_enabled = false
url = 'http:%5C%5Cwww.scouts.org.uk'
begin
page_head = agent.head(url)
rescue Exception => exception
puts "Caught exception: #{exception.message}"
end
结果:
=> #Caught exception: undefined method `length' for nil:NilClass
您可以添加此方法来检查是否有效的url:
require 'uri'
def valid?(url)
uri = URI.parse(url)
if uri.kind_of?(URI::HTTP) == true
puts '+'
else
puts '-'
end
rescue URI::InvalidURIError
puts 'false '
end
['http://web.de',
'http://web.de/',
'http:%5c%5cweb.de',
'http:web.de',
'foo://web.de',
'http://we b.de',
'http://|web.de'].each { |i|
valid?(i)
}
+
+
+
+
错误
错误