我正在寻找在Ruby中完成以下结构/逻辑问题的最佳方法:
一个网站需要完全抓取,收集每个页面的标题。
但:
- 网站的树状结构是未知的(有多少"层次","分支"等)
- 代码应该是"DRY" (= "Don't Repeat Yourself")
下面的(简化的)例子当然是非常愚蠢的:
url = some_root_url
@title_collection = Array.new
go_to_page(url)
@title_collection << find_all_titles_on_page
urls = find_all_urls_on_page
urls.each do |url|
go_to_page(url)
@title_collection << find_all_titles_on_page
urls = find_all_urls_on_page
urls.each do |url|
go_to_page(url)
@title_collection << find_all_titles_on_page
urls = find_all_urls_on_page
urls.each do |url|
go_to_page(url)
@title_collection << find_all_titles_on_page
urls = find_all_urls_on_page
urls.each do |url|
go_to_page(url)
@title_collection << find_all_titles_on_page
urls = find_all_urls_on_page
urls.each do |url|
go_to_page(url)
@title_collection << find_all_titles_on_page
urls = find_all_urls_on_page
[...]
end
end
end
end
end
那么你如何以"DRY"的方式灵活高效地完成这一任务呢?
非常感谢!
汤姆递归是你的朋友:
def walk_tree(url)
go_to_page(url)
title_collection << find_all_titles_on_page
urls = find_all_urls_on_page
urls.each do |child_url|
title_collection << walk_tree(child_url)
end
title_collection
end