如何仅选择包含特定内容的表行

我正在抓取一个有许多表行的电子邮件，其中一些我想要排除。我需要的表行看起来完全像:

<tr>
  <td class="quantity"> ANYTHING BUT EMPTY </td>
  <td class="description"> ANYTHING BUT EMPTY </td>
  <td class="price"> ANYTHING BUT EMPTY </td>
</tr>

所有表行都没有类或id。此外，还有不需要的<table>行，其中包含具有这些类的单元格，但有些没有值，因此我只需要获得具有这三类单元格的表行，并且所有三个单元格都具有非空值。我不确定这样做的语法:

body = Nokogiri::HTML(email)
wanted_rows = body.css('tr').select{ NOT SURE HOW TO ENCAPSULATE LOGIC HERE }

使用XPath非常简单:

wanted_rows = body.xpath('//tr[td[(@class = "quantity") and normalize-space()]
  and td[(@class = "description") and normalize-space()]
  and td[(@class = "price") and normalize-space()]]')

normalize-space()调用实际上与normalize-space(.) != ""相同，即它们检查当前节点(td)是否包含除空白之外的其他内容。

相关内容

最新更新

热门标签：