小贝子编程

如何<br>从 Response.xpath 中排除特定的 Tag()？

本文关键字：Tag 排除 xpath br 如何 Response python dom xpath extract
更新时间 : 2023-09-21
英文 : How Can I Exclude specific Tag(<br>) from Response.xpath?

下面是一些示例源html，我想获得一个字符串(或列表(后果

<font class="news">
<table border="0" cellspacing="0" cellpadding="0" align="right">
<tr>
<td style="padding-left:10px; padding-bottom:5px;">
<a href="../1.jpg" target="_blank" onfocus='this.blur()'>
<img src="../pic1/small_16239927831.jpg" width="300" >
</a>
</td>
</tr>
</table>
AAA<br><br>
BBB<br><br>
CCC<br>
</font>

我可以用这个得到一些结果

response.xpath('//font[@class="body_news"]/text()')

或

response.xpath('//font[@class="body_news"]/text()').extract()

但是，结果有很多n或nt，我只想得到"AAA BBB CCC"或['AAA','BBB','CCC']。

我也使用了normalize-space()，但不起作用。如何排除这些换行符或制表符？

['AAA', 'ntt', 'nntt', 'BBB', 'ntt', 'CCC', 'nt' ]

此XPath:

normalize-space(//font[@class='news'])

给出以下结果：

AAA BBB CCC

如何<br>从 Response.xpath 中排除特定的 Tag()？

相关内容

最新更新

热门标签：