在实际提交下载之前,我正在尝试解析图像URL的列表并获取一些基本信息。
- 图像是否存在(使用response.code解决?)
- 我已经有图片了吗(想看看类型和尺寸吗?)
我的脚本每天都会检查一个大列表(大约1300行),每行都有30-40个图像URL。我的@photo_urls变量允许我跟踪已经下载的内容。我真的希望以后能够将其用作哈希(而不是示例代码中的数组),以便稍后进行交互并进行实际下载。
现在我的问题(除了是Ruby新手之外)是Net::HTTP::Pipeline只接受Net::HTTPRequest对象的数组。nethttp管道的文档表明,响应对象将以与进入的相应请求对象相同的顺序返回。问题是,除了该顺序之外,我没有办法将请求与响应关联起来。然而,我不知道如何获得块内的相对序数位置。我假设我可以只有一个计数器变量,但我如何通过顺序位置访问哈希?
Net::HTTP.start uri.host do |http|
# Init HTTP requests hash
requests = {}
photo_urls.each do |photo_url|
# make sure we don't process the same image again.
hashed = Digest::SHA1.hexdigest(photo_url)
next if @photo_urls.include? hashed
@photo_urls << hashed
# change user agent and store in hash
my_uri = URI.parse(photo_url)
request = Net::HTTP::Head.new(my_uri.path)
request.initialize_http_header({"User-Agent" => "My Downloader"})
requests[hashed] = request
end
# process requests (send array of values - ie. requests) in a pipeline.
http.pipeline requests.values do |response|
if response.code=="200"
# anyway to reference the hash here so I can decide whether
# I want to do anything later?
end
end
end
最后,如果有更简单的方法,请随时提供任何建议。
谢谢!
使请求成为一个数组而不是散列,并在响应到来时弹出请求:
Net::HTTP.start uri.host do |http|
# Init HTTP requests array
requests = []
photo_urls.each do |photo_url|
# make sure we don't process the same image again.
hashed = Digest::SHA1.hexdigest(photo_url)
next if @photo_urls.include? hashed
@photo_urls << hashed
# change user agent and store in hash
my_uri = URI.parse(photo_url)
request = Net::HTTP::Head.new(my_uri.path)
request.initialize_http_header({"User-Agent" => "My Downloader"})
requests << request
end
# process requests (send array of values - ie. requests) in a pipeline.
http.pipeline requests.dup do |response|
request = requests.shift
if response.code=="200"
# Do whatever checking with request
end
end
end