我正试图从coursera下载我所有的课堂笔记。我想,既然我正在学习ruby,这将是一个很好的练习,下载他们所有的PDF供将来使用。不幸的是,我收到一个异常,说ruby由于某种原因无法连接。这是我的代码:
require 'net/http'
module Coursera
class Downloader
attr_accessor :page_url
attr_accessor :destination_directory
attr_accessor :cookie
def initialize(page_url,dest,cookie)
@page_url=page_url
@destination_directory = dest
@cookie=cookie
end
def download
puts @page_url
request = Net::HTTP::Get.new(@page_url)
puts @cookie.encoding
request['Cookie']=@cookie
# the line below is where the exception is thrown
res = Net::HTTP.start(@page_url.hostname, use_ssl=true,@page_url.port) {|http|
http.request(request)
}
html_page = res.body
pattern = /http[^"]+.pdf/
i=0
while (match = pattern.match(html_page,i)) != nil do
# 0 is the entire string.
url_string = match[0]
# make sure that 'i' is updated
i = match.begin(0)+1
# we want just the name of the file.
j = url_string.rindex("/")
filename = url_string[j+1..url_string.length]
destination = @destination_directory+"\"+filename
# I want to download that resource to that file.
uri = URI(url_string)
res = Net::HTTP.get_response(uri)
# write that body to the file
f=File.new(destination,mode="w")
f.print(res.body)
end
end
end
end
page_url_string = 'https://class.coursera.org/datasci-002/lecture'
puts page_url_string.encoding
dest='C:\Users\michael\training material\data_science'
page_url=URI(page_url_string)
# I copied this from my browsers developer tools, I'm omitting it since
# it's long and has my session key in it
cookie="..."
downloader = Coursera::Downloader.new(page_url,dest,cookie)
downloader.download
在运行时,以下内容被写入控制台:
Fast Debugger (ruby-debug-ide 0.4.22, debase 0.0.9) listens on 127.0.0.1:65485
UTF-8
https://class.coursera.org/datasci-002/lecture
UTF-8
Uncaught exception: A socket operation was attempted to an unreachable network. - connect(2)
C:/Ruby200-x64/lib/ruby/2.0.0/net/http.rb:878:in `initialize'
C:/Ruby200-x64/lib/ruby/2.0.0/net/http.rb:878:in `open'
C:/Ruby200-x64/lib/ruby/2.0.0/net/http.rb:878:in `block in connect'
C:/Ruby200-x64/lib/ruby/2.0.0/timeout.rb:52:in `timeout'
C:/Ruby200-x64/lib/ruby/2.0.0/net/http.rb:877:in `connect'
C:/Ruby200-x64/lib/ruby/2.0.0/net/http.rb:862:in `do_start'
C:/Ruby200-x64/lib/ruby/2.0.0/net/http.rb:851:in `start'
C:/Ruby200-x64/lib/ruby/2.0.0/net/http.rb:582:in `start'
C:/Users/michael/Documents/Aptana Studio 3 Workspace/practice/CourseraDownloader.rb:20:in `download'
C:/Users/michael/Documents/Aptana Studio 3 Workspace/practice/CourseraDownloader.rb:52:in `<top (required)>'
C:/Ruby200-x64/bin/rdebug-ide:23:in `load'
C:/Ruby200-x64/bin/rdebug-ide:23:in `<main>'
C:/Ruby200-x64/lib/ruby/2.0.0/net/http.rb:878:in `initialize': A socket operation was attempted to an unreachable network. - connect(2) (Errno::ENETUNREACH)
from C:/Ruby200-x64/lib/ruby/2.0.0/net/http.rb:878:in `open'
from C:/Ruby200-x64/lib/ruby/2.0.0/net/http.rb:878:in `block in connect'
from C:/Ruby200-x64/lib/ruby/2.0.0/timeout.rb:52:in `timeout'
from C:/Ruby200-x64/lib/ruby/2.0.0/net/http.rb:877:in `connect'
from C:/Ruby200-x64/lib/ruby/2.0.0/net/http.rb:862:in `do_start'
from C:/Ruby200-x64/lib/ruby/2.0.0/net/http.rb:851:in `start'
from C:/Ruby200-x64/lib/ruby/2.0.0/net/http.rb:582:in `start'
from C:/Users/michael/Documents/Aptana Studio 3 Workspace/practice/CourseraDownloader.rb:20:in `download'
from C:/Users/michael/Documents/Aptana Studio 3 Workspace/practice/CourseraDownloader.rb:52:in `<top (required)>'
from C:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/ruby-debug-ide-0.4.22/lib/ruby-debug-ide.rb:86:in `debug_load'
from C:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/ruby-debug-ide-0.4.22/lib/ruby-debug-ide.rb:86:in `debug_program'
from C:/Ruby200-x64/lib/ruby/gems/2.0.0/gems/ruby-debug-ide-0.4.22/bin/rdebug-ide:110:in `<top (required)>'
from C:/Ruby200-x64/bin/rdebug-ide:23:in `load'
from C:/Ruby200-x64/bin/rdebug-ide:23:in `<main>'
我按照这里的说明编写了所有的HTTP代码。据我所见,我一直在追随他们。
我使用的是Windows7、ruby 2.0.0p481和Aptana Studio 3。当我将url复制到浏览器中时,它会直接进入页面,没有任何问题。当我在浏览器中查看该url的请求标头时,我没有看到我认为缺少的其他内容。我还试着设置了Host和Referer请求头,没有什么区别。
我没有想法,已经在Stack Overflow上搜索了类似的问题,但这并没有帮助。请告诉我我缺了什么。
所以,我在另一个项目中收到了同样的错误消息,问题是我的机器实际上无法连接到IP/端口。你试过用curl连接吗?如果它在你的浏览器中工作,它可能使用代理或其他东西来实际到达那里。用curl测试URL为我解决了这个问题。