所以我的索引操作中有这段代码,我很想把它移到一个模型中,只是对如何做有点困惑。
原始代码
def index
urls = %w[http://cltampa.com/blogs/potlikker http://cltampa.com/blogs/artbreaker http://cltampa.com/blogs/politicalanimals http://cltampa.com/blogs/earbuds http://cltampa.com/blogs/dailyloaf http://cltampa.com/blogs/bedpost]
@final_images = []
@final_urls = []
urls.each do |url|
blog = Nokogiri::HTML(open(url))
images = blog.xpath('//*[@class="postBody"]/div[1]//img/@src')
images.each do |image|
@final_images << image
end
story_path = blog.xpath('//*[@class="postTitle"]/a/@href')
story_path.each do |path|
@final_urls << path
end
end
end
我在我的模型中测试了这段代码,它非常适合一个url,只是不确定如何像原始代码一样集成所有url。
新代码
型号
class Photocloud < ActiveRecord::Base
attr_reader :url, :data
def initialize(url)
@url = url
end
def data
@data ||= Nokogiri::HTML(open(url))
end
def get_elements(path)
data.xpath(path)
end
end
控制器
def index
@scraper = Photocloud.new('http://cltampa.com/blogs/artbreaker')
@photos = @scraper.get_elements('//*[@class="postBody"]/div[1]//img/@src')
@story_urls = @scraper.get_elements('//*[@class="postBody"]/div[1]//img/@src')
end
我的主要问题是如何初始化多个url并像原始代码一样循环使用它们。我尝试过不同的东西,但感觉自己碰壁了。我需要将它们保存到数据库中,但我想先让它工作起来。非常感谢您的帮助。
更新的控制器-WIP
def index
start_urls = %w[http://cltampa.com/blogs/potlikker
http://cltampa.com/blogs/artbreaker
http://cltampa.com/blogs/politicalanimals
http://cltampa.com/blogs/earbuds
http://cltampa.com/blogs/dailyloaf
http://cltampa.com/blogs/bedpost]
@scraper = Photocloud.new(start_urls)
@images =
@paths =
end
这部分需要一些帮助。。。
似乎您没有将抓取的图像和路径持久化到数据库,因此Photocloud
不需要从ActiveRecord::Base
继承-它可以只是一个普通的旧ruby对象(PORO):
class Photocloud
attr_reader :start_urls
attr_accessor :images, :paths
def initialize(start_urls)
@start_urls = start_urls
@images = []
@paths = []
end
def scrape
start_urls.each do |start_url|
blog = Nokogiri::HTML(open(url))
scrape_images(blog)
scrape_paths(blog)
end
end
private
def scrape_images(blog)
images = blog.xpath('//*[@class="postBody"]/div[1]//img/@src')
images.each do |image|
images << image
end
end
def scrape_paths(blog)
story_path = blog.xpath('//*[@class="postTitle"]/a/@href')
story_path.each do |path|
paths << path
end
end
end
控制器内:
scraper = Photocloud.new(start_urls)
scraper.scrape
@images = scraper.images
@paths = scraper.paths
当然,这只是构建代码的可能性之一。