
我在Windows Vista 64位上使用Python.org版本2.7 64位。我成功地使用了一个用Scrapy构建的递归webscraper来解析维基百科文章中的所有文本。但是,我试图将相同的代码应用于代码中引用的网站,但它不返回任何文本正文:

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector
from scrapy.item import Item
from scrapy.spider import BaseSpider
from scrapy import log
from scrapy.cmdline import execute
from scrapy.utils.markup import remove_tags
import time

class ExampleSpider(CrawlSpider):
    name = "goal3"
    allowed_domains = [""]
    start_urls = [""]
    download_delay = 1
    rules = [Rule(SgmlLinkExtractor(allow=()), follow=True, callback='parse_item')]
    #rules = [Rule(SgmlLinkExtractor(allow=()), 
             #Rule(SgmlLinkExtractor(allow=()), callback='parse_item')
    #rules = [
    def parse_item(self,response):
        self.log('A response from %s just arrived!' % response.url)
        scripts = response.selector.xpath("normalize-space(//title)")
        for scripts in scripts:
            body = response.xpath('//p').extract()
            body2 = "".join(body)
            print remove_tags(body2).encode('utf-8')  




from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.utils.markup import remove_tags

class ExampleSpider(CrawlSpider):
    name = "goal3"
    allowed_domains = [""]
    start_urls = [""]
    download_delay = 1
    rules = [Rule(SgmlLinkExtractor(allow=()), follow=True, callback='parse_item')]
    def parse_item(self,response):
        paragraphs = response.selector.xpath("//p").extract()
        text = "".join(remove_tags(paragraph).encode('utf-8') for paragraph in paragraphs)
        print text


"There is no budget, there is money. We are in a very strong financial position. We can make big signings." Music to the ears of Manchester United fans as vice-chairman Ed Woodward confirmed the club can make big-money acquisitions in this very transfer window. In a bid to return to the summit of England’s top tier, Woodward has effectively given the green light to a spending spree that has supporters rubbing their hands with glee. Ander Herrara and Luke Shaw have arrived for a combined £59m already this summer and the carousel through the Old Trafford entrance door shows no sign of slowing down. Ángel Di María, Mats Hummels and Daley Blind, amongst others, have all been linked with a move to United, while reports suggesting midfield pitbull Arturo Vidal is set to join Louis van Gaal’s side refuse to die down.  "I’m still on holiday at the moment. Can I say I’m staying at Juve? I don’t know. On Monday I’ll talk to (Juventus manager, Massimili
 Contact Us | About Us | Glossary | Privacy Policy | WhoScored Ratings
            Copyright © 2014
