Scrapy xpath在Scrapy shell中返回空列表



我正在尝试使用下面的xpath命令使用scrapy shell启动本页上的文章:

n = response.xpath('//article[contains(@class, "post-block post-block--image")]/header/h2/a/text()').getall()
n
[]

该命令只返回0篇文章,而不是18篇,当我尝试

时,我可以看到
//article[contains(@class, "post-block post-block--image")]/header/h2/a/text()

在Chrome检查员。我如何得到文章在scrapy壳吗?

你可以从json:

scrapy shell
In [1]: url = 'https://techcrunch.com/wp-json/tc/v1/magazine?page=1&_embed=true&_envelope=true&categories=20429&cachePr
...: evention=0'
In [2]: headers = {
...: "Accept": "*/*",
...: "Accept-Encoding": "gzip, deflate, br",
...: "Accept-Language": "en-US,en;q=0.5",
...: "Cache-Control": "no-cache",
...: "Connection": "keep-alive",
...: "Content-Type": "application/json; charset=utf-8",
...: "DNT": "1",
...: "Host": "techcrunch.com",
...: "Pragma": "no-cache",
...: "Referer": "https://techcrunch.com/startups/",
...: "Sec-Fetch-Dest": "empty",
...: "Sec-Fetch-Mode": "cors",
...: "Sec-Fetch-Site": "same-origin",
...: "Sec-GPC": "1",
...: "TE": "trailers",
...: "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.372
...: 9.169 Safari/537.36",
...: "X-KL-Ajax-Request": "Ajax_Request",
...: "X-TC-EC-Auth-Token": "",
...: "X-TC-UUID": ""
...: }
In [3]: req = scrapy.Request(url=url, headers=headers)
In [4]: fetch(req)
[scrapy.core.engine] INFO: Spider opened
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://techcrunch.com/wp-json/tc/v1/magazine?page=1&_embed=true&_envelope=true&categories=20429&cachePrevention=0> (referer: https://techcrunch.com/startups/)
In [5]: view(response)
Out[5]: True
In [6]: body = response.json()['body']
In [7]: for b in body:
...:     print(b['slug'])
...:
how-to-claim-a-student-discount-for-techcrunch
chimes-chris-britt-and-menlo-ventures-shawn-carolan-to-talk-fintech-on-techcrunch-live
graphwear-closes-20-5m-series-b-for-a-needle-free-nanotech-powered-glucose-monitor
investors-share-how-infrastructure-as-code-is-taking-over-devops
informaticas-ipo-will-test-public-markets-appetite-for-slower-growing-tech-offerings
index-sequoia-and-canvas-investors-weigh-in-on-how-to-raise-your-first-dollars
lawpath-gets-7-5m-aud-to-become-the-asia-pacifics-legalzoom
equity-monday-byjus-raises-more-money-somehow-as-tech-stocks-fall
stories-as-a-service-storyteller-lets-anyone-add-stories-to-their-own-apps-or-website
rich-and-worried-about-the-world-put-your-money-where-your-concern-is
made-of-air-a-maker-of-carbon-negative-thermoplastics-locks-in-5-8m
insurtech-stable-raises-46-5m-in-greycroft-led-round-to-help-businesses-manage-volatile-commodity-prices
as-apple-messes-with-attribution-what-does-growth-marketing-look-like-in-2021
yc-grads-wasp-land-1-5m-seed-to-help-developers-build-web-apps-faster
ladder-raises-100m-on-a-900m-valuation-for-a-platform-selling-flexible-term-life-insurance
devops-market-demand-drives-quick-series-c-turnaround-for-esper
elevate-launches-its-approach-to-managing-pre-tax-benefits-with-12m-series-a
to-the-market-takes-on-funding-to-create-ethical-sustainable-work-environments-for-women
indian-edtech-giant-byjus-valued-at-18-billion-in-new-funding
komunidad-a-philippines-based-environmental-intelligence-platform-lands-seed-round

最新更新