Quite simply can you do a conditional scrape, i.e. I want an <a>
tag within a parent, and if a <span> is contained within that parent
(so the span is holding the <a>, instead of the parent), I still want
to drill into the span regardless for the <a>
希望此示例将提供足够的详细信息。
<tr>
<td>1989</td>
<td>
<i>
<a href="/wiki/Always_(1989_film)" title="Always (1989 film)">Always</a>
</i>
</td>
<td>Pete Sandich</td>
</tr>
我可以使用以下命令访问<a>
:
all_links = doca.search('//tr//td//i//a[@href]')
但是我想知道的是,我也可以添加一个条件,所以如果<a>
周围有一个跨度,可以把它放在搜索中吗?
<tr>
<td>1989</td>
<td>
<i>
<span>
<a href="/wiki/Always_(1989_film)" title="Always (1989 film)">Always</a>
</span>
</i>
</td>
<td>Pete Sandich</td>
</tr>
那么有没有办法有条件地抓住<a>
,像这样:
all_links = doca.search('//tr//td//i//?span//a[@href]')
其中 ?span 是一个条件 - 即,如果存在跨度,则输入该级别,然后输入链接。
如果没有跨度,请跳过它并输入链接。
提前感谢,非常感谢任何帮助!
谢恩
我们来了:
require 'nokogiri'
doc = Nokogiri::HTML::Document.parse <<-eot
<tr>
<td>1989</td>
<td>
<i>
<span>
<a href='/wiki2/Always_(1989_film)' title='Always (1989 film)'>Always</a>
</span>
</i>
</td>
<td>
<i>
<a href='/wiki1/Always_(1989_film)' title='Always (1989 film)'>Always</a>
</i>
</td>
<td>Pete Sandich</td>
</tr>
eot
# xpath expression will grab a tag if it is wrapped inside the span tag
node = doc.xpath("//tr//i//a[name(./..)='span']")
p node.size # => 1
p node.map{ |n| n['href'] } # => ["/wiki2/Always_(1989_film)"]