小贝子编程

scropy:如何获取标题后面的所有段落

本文关键字：段落标题何获取获取 scropy python scrapy web-crawler
更新时间 : 2023-09-20
英文 : scrapy : How to get all the paragraphs which comes after a heading?

我想提取所有<p>带有标题的标记文本。

<html>
<head>
<title>My Page</title>
</head>
<body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
<h2>My Second Heading</h2>
<p>My Second paragraph.</p>
<h3>My Third Heading</h3>
<a> There might be something else in middle </a>
<p>My Third paragraph.</p>
<p>My fourth paragraph.</p>
<p>My fifth paragraph.</p>
<p>My sixth paragraph.</p>
</body>
</html>

我想提取所有<p>像这样标记标题后面的文本，忽略没有标题的文本。

["My first paragraph", "My second paragraph", "My third paragraph"]

这：

response.xpath("//*[starts-with(name(), 'h')]/following-sibling::p[1]/text()").getall()

将返回：

['My first paragraph.', 'My Second paragraph.', 'My Third paragraph.']

scropy:如何获取标题后面的所有段落

相关内容

最新更新

热门标签：