保存一系列页面的脚本然后尝试组合它们,但只组合一个



这是我的代码。。

require "open-uri"
base_url = "http://en.wikipedia.org/wiki"
(1..5).each do |x|
  # sets up the url
  full_url = base_url + "/" + x.to_s
  # reads the url
  read_page = open(full_url).read
  # saves the contents to a file and closes it
  local_file = "my_copy_of-" + x.to_s + ".html"
  file = open(local_file,"w")
  file.write(read_page)
  file.close
  # open a file to store all entrys in
  combined_numbers = open("numbers.html", "w")
  entrys = open(local_file, "r")
  combined_numbers.write(entrys.read)
  entrys.close
  combined_numbers.close
end

正如你所看到的。它基本上是抓取维基百科文章1到5的内容,然后尝试将它们合并到一个名为numbers.html.的文件中

它做对了第一点。但到了第二个。这似乎只是在循环中写第五篇文章的内容。

不过我看不出哪里出了问题。有什么帮助吗?

打开摘要文件时选择了错误的模式"w"覆盖现有文件,而"a"追加到现有文件

所以用这个让你的代码工作:

combined_numbers = open("numbers.html", "a")

否则,每次循环时,numbers.html的文件内容都会被当前文章覆盖。


此外,我认为您应该使用read_page中的内容写入numbers.html,而不是从新编写的文件中读取它们

require "open-uri"
(1..5).each do |x|
  # set up and read url
  url = "http://en.wikipedia.org/wiki/#{x.to_s}"
  article = open(url).read
  # saves current article to a file
  # (only possible with 1.9.x use open too if on 1.8.x)
  IO.write("my_copy_of-#{x.to_s}.html", article)
  # add current article to summary file
  open("numbers.html", "a") do |f|
    f.write(article)
  end
end

最新更新