我想从URL下载图片,比如:http://trinity.e-stile.ru/并将图像保存到类似"C:\pickaxe\pictures"的目录中。使用Nokogiri很重要。
我在这个网站上读到了类似的问题,但我没有发现它是如何工作的,我也不理解算法。
-
我编写了解析URL的代码,并将带有"img"标记的部分网页源代码放入链接对象中:
require 'nokogiri' require 'open-uri' PAGE_URL="http://trinity.e-stile.ru/" page=Nokogiri::HTML(open(PAGE_URL)) #parsing into object links=page.css("img") #object with html code with img tag puts links.length # it is 24 images on this url puts links.each{|i| puts i } #it looks like: <img border="0" alt="" src="/images/kroliku.jpg"> puts puts links.each{|link| puts link['src'] } #/images/kroliku.jpg
获取HTML代码后,使用什么方法保存图片?
-
如何将图像放入磁盘上的目录?
我更改了代码,但它有一个错误:
/home/action/.parts/packages/ruby2.1/2.1.1/lib/ruby/2.1.0/net/http.rb:879:in `initialize': getaddrinfo: Name or service not known (SocketError)
这是现在的代码:
require 'nokogiri'
require 'open-uri'
require 'net/http'
LOCATION = 'pics'
if !File.exist? LOCATION # create folder if it is not exist
require 'fileutils'
FileUtils.mkpath LOCATION
end
#PAGE_URL = "http://ruby.bastardsbook.com/files/hello-webpage.html"
#PAGE_URL="http://trinity.e-stile.ru/"
PAGE_URL="http://www.youtube.com/"
page=Nokogiri::HTML(open(PAGE_URL))
links=page.css("img")
links.each{|link|
Net::HTTP.start(PAGE_URL) do |http|
localname = link.gsub /.*//, '' # left the filename only
resp = http.get link['src']
open("#{LOCATION}/#{localname}", "wb") do |file|
file.write resp.body
end
end
}
您差不多完成了。剩下的就是存储文件。让我们来做吧。
LOCATION = 'C:pickaxepictures'
if !File.exist? LOCATION # create folder if it is not exist
require 'fileutils'
FileUtils.mkpath LOCATION
end
require 'net/http'
.... # your code with nokogiri etc.
links.each{|link|
Net::HTTP.start(PAGE_URL) do |http|
localname = link.gsub /.*//, '' # left the filename only
resp = http.get link['src']
open("#{LOCATION}/#{localname}", "wb") do |file|
file.write resp.body
end
end
end
就是这样。
正确的版本:
require 'nokogiri'
require 'open-uri'
LOCATION = 'pics'
if !File.exist? LOCATION # create folder if it is not exist
require 'fileutils'
FileUtils.mkpath LOCATION
end
#PAGE_URL="http://trinity.e-stile.ru/"
PAGE_URL="http://www.youtube.com/"
page=Nokogiri::HTML(open(PAGE_URL))
links=page.css("img")
links.each{|link|
uri = URI.join(PAGE_URL, link['src'] ).to_s # make absolute uri
localname=File.basename(link['src'])
File.open("#{LOCATION}/#{localname}",'wb') { |f| f.write(open(uri).read) }
}