如何从网站检索所有用户注释

我想要来自这个网站的所有用户评论：http://www.consumercomplaints.in/?search=chevrolet

问题是评论只是部分显示，要查看完整的评论，我必须单击上面的标题，并且必须对所有评论重复此过程。

另一个问题是有很多页的评论。

因此，我想将所有完整的注释存储在上述指定站点的 excel 表中。这可能吗？我正在考虑使用crawler4j和jericho以及Eclipse。

我的访问页面方法代码： @Override 公众无效访问（页面页） {
字符串 url = page.getWebURL（）.getURL（）; System.out.println（"URL： " + url）;

           if (page.getParseData() instanceof HtmlParseData) {
                   HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
                   String html = htmlParseData.getHtml();
  //               Set<WebURL> links = htmlParseData.getOutgoingUrls();
  //               String text = htmlParseData.getText();
                   try
                   {
                       String CrawlerOutputPath = "/DA Project/HTML Source/";
                       File outputfile = new File(CrawlerOutputPath);
                       //If file doesnt exists, then create it
                        if(!outputfile.exists()){
                            outputfile.createNewFile();
                        }
                       FileWriter fw = new FileWriter(outputfile,true);  //true = append file
                       BufferedWriter bufferWritter = new BufferedWriter(fw);
                       bufferWritter.write(html);
                       bufferWritter.close();
                       fw.write(html);
                       fw.close();
                   }catch(IOException e)
                   {
                       System.out.println("IOException : " + e.getMessage() );
                       e.printStackTrace();
                   }
                   System.out.println("Html length: " + html.length());
           }
   }

提前谢谢。任何帮助将不胜感激。

是的，这是可能的。

开始在搜索网站上进行抓取（http://www.consumercomplaints.in/?search=chevrolet）
使用 crawler4j 的 visitPage 方法仅关注评论和正在进行的页面。
从 crawler4j 中获取 html 内容并将其推到 jericho
过滤掉要存储的内容并将其写入某种.csv或.xls文件（我更喜欢.csv）

希望这对你有帮助

相关内容

最新更新

热门标签：