Scrapy无法抓取vnexpress网站的链接-评论

我是Scrapy&蟒蛇我试图从以下URL获取评论，但结果始终为空：http://vnexpress.net/tin-tuc/oto-xe-may/toyota-camry-2016-dinh-loi-tui-khi-khong-bung-3386676.html

这是我的代码：

from scrapy.spiders import Spider
from scrapy.selector import Selector
from tutorial.items import TutorialItem
import logging
class TutorialSpider(Spider):
    name = "vnexpress"
    allowed_domains = ["vnexpress.net"]
    start_urls = [
        "http://vnexpress.net/tin-tuc/oto-xe-may/toyota-camry-2016-dinh-loi-tui-khi-khong-bung-3386676.html"
    ]
    def parse(self, response):
        sel = Selector(response)
        commentList = sel.xpath('//div[@class="comment_item"]')
        items = []
        id = 0;
        logging.log(logging.INFO, "TOTAL COMMENT : " + str(len(commentList)))
        for comment in commentList:
            item = TutorialItem()
            id = id + 1
            item['id'] = id
            item['mainId'] = 0
            item['user'] = comment.xpath('//span[@class="left txt_666 txt_11"]/b').extract()
            item['time'] = 'N/A'
            item['content'] = comment.xpath('//p[@class="full_content"]').extract()
            item['like'] = comment.xpath('//span[@class="txt_666 txt_11 right block_like_web"]/a[@class="txt_666 txt_11 total_like"]').extract()
            items.append(item)
        return items

感谢阅读

看起来注释是用一些JavaScript代码加载到页面中的。

Scrapy不在页面上执行JavaScript，它只下载HTML页面。尝试在浏览器中禁用JavaScript的情况下打开页面，你应该像Scrapy看到的那样看到页面。

你有几个选择：

使用浏览器的开发工具面板，在"网络"选项卡中，对注释加载到页面中的方式进行反向工程（可能是加载HTML或JSON数据的XHR调用）
使用（headless）浏览器呈现页面（selenium、casper.js、splash…）；
- 例如，您可能想使用Splash（用于web抓取的JavaScript渲染选项之一）来尝试此页面。这是您从Splash返回的HTML（其中包含注释）：http://pastebin.com/njgCsM9w

相关内容

最新更新

热门标签：