Nokogiri & 返回两个标签之间的所有数据



我正在做一个项目,从https://platinumgod.co.uk/&我很难访问两个元素之间的所有<p>标签。

下面是HTML:

<li class="textbox" data-tid="42.5" data-cid="42" data-sid="263" style="display: inline-block;">
<a>
<div onclick="" class="item reb-itm-new re-itm263"></div>
<span>
<p class="item-title">Clear Rune</p>
<p class="r-itemid">ItemID: 263</p>
<p class="pickup">"Rune mimic"</p>
<p class="quality">Quality: 2</p>
<p>When used, copies the effect of the Rune or Soul stone you are holding (like the Blank Card)</p>
<p>Drops a random rune on the floor when picked up</p>
<p>The recharge time of this item depends on the Rune/Soul Stone held:</p>
<p>1 room: Soul of Lazarus</p>
<p>2 rooms: Rune of Ansuz, Rune of Berkano, Rune of Hagalaz, Soul of Cain</p>
<p>3 rooms: Rune of Algiz, Blank Rune, Soul of Magdalene, Soul of Judas, Soul of ???, Soul of the Lost</p>
<p>4 rooms: Rune of Ehwaz, Rune of Perthro, Black Rune, Soul of Isaac, Soul of Eve, Soul of Eden, Soul of the Forgotten, Soul of Jacob and Esau</p>
<p>6 rooms: Rune of Dagaz, Soul of Samson, Soul of Azazel, Soul of Apollyon, Soul of Bethany</p>
<p>12 rooms: Rune of Jera, Soul of Lilith, Soul of the Keeper</p>
<ul>
<p>Type: Active</p>
<p>Recharge time: Varies</p>
<p>Item Pool: Secret Room, Crane Game</p>
</ul>
<p class="tags">* Secret Room</p>
</span>
</a>
</li>

我要做的是返回<p class="quality">之间的所有<p>标签(不包括此标签)&第一个<ul>.

我已经尝试了在论坛上找到的几个解决方案&我只使用了我在其中一个答案中找到的以下代码(不打算撒谎,我很难理解这里发生了什么),并取得了部分成功。我之所以迭代是因为HTML中有几个项目需要删除:

items = html.at(".repentanceitems-container").css("li.textbox").each do |item|
use = item.xpath(".//a/span/p[5]/following-sibling::p[count(.//a/span/p[6]/preceding-sibling::p)= 
count(.//a/span/p[6]/preceding-sibling::p)]")
end

然而,这只返回<p class="quality">之后的第一个<p>标签。我相信这可能是一些简单的原因,因为我不懂代码。我还访问了第一个<p>元素,我想包括&<ul>,它需要结束,但我不确定究竟如何使用这个信息:

# First line of item use
start = item.xpath('.//a/span/p[5]')
# ul tag
ending = item.xpath('.//a/span/ul[1]')

任何帮助,这将是非常感激!

如何:

require "nokogiri"
html = '<li class="textbox" data-tid="42.5" data-cid="42" data-sid="263" style="display: inline-block;"> <a> <div onclick="" class="item reb-itm-new re-itm263"></div> <span> <p class="item-title">Clear Rune</p> <p class="r-itemid">ItemID: 263</p> <p class="pickup">"Rune mimic"</p> <p class="quality">Quality: 2</p> <p>When used, copies the effect of the Rune or Soul stone you are holding (like the Blank Card)</p> <p>Drops a random rune on the floor when picked up</p> <p>The recharge time of this item depends on the Rune/Soul Stone held:</p> <p>1 room: Soul of Lazarus</p> <p>2 rooms: Rune of Ansuz, Rune of Berkano, Rune of Hagalaz, Soul of Cain</p> <p>3 rooms: Rune of Algiz, Blank Rune, Soul of Magdalene, Soul of Judas, Soul of ???, Soul of the Lost</p> <p>4 rooms: Rune of Ehwaz, Rune of Perthro, Black Rune, Soul of Isaac, Soul of Eve, Soul of Eden, Soul of the Forgotten, Soul of Jacob and Esau</p> <p>6 rooms: Rune of Dagaz, Soul of Samson, Soul of Azazel, Soul of Apollyon, Soul of Bethany</p> <p>12 rooms: Rune of Jera, Soul of Lilith, Soul of the Keeper</p> <ul> <p>Type: Active</p> <p>Recharge time: Varies</p> <p>Item Pool: Secret Room, Crane Game</p> </ul> <p class="tags">* Secret Room</p> </span> </a> </li>'
puts Nokogiri::HTML(html).css(".quality ~ p:not(.tags)")[1..].map {|e| e.text}

~语法选择当前和以后的兄弟元素,因此我使用切片来跳过第一个元素。我假设.tags.quality之后唯一省略的其他类;如果除此之外还有其他元素,您也需要:not,或者在.each循环中手动检测并跳过它们,除非有人知道更聪明的技巧。

你可能想看看nokogiri.org的这个草稿教程,它解释了一些方法。

采用第三种(也是最通用的)方法,下面是一些您想要的代码:

class CSSSection
def self.item_section(item)
document = item.document
start_tag = item.at_css("p.quality")
end_tag = item.at_css("ul")
# grab siblings that follow the start tag
following_siblings_query = "#{start_tag.path}/following-sibling::*"
following_siblings = document.xpath(following_siblings_query)
# grab siblings that precede the end tag
preceding_siblings_query = "#{end_tag.path}/preceding-sibling::*"
preceding_siblings = document.xpath(preceding_siblings_query)
following_siblings & preceding_siblings # xpath intersection
end
end
doc = Nokogiri::HTML4(html)
li_nodes = doc.css("li") # whatever the query is to get the relevant "li" elements
data = li_nodes.map do |li_node|
CSSSection.item_section(li_node)
end
puts data.first
# => <p>When used, copies the effect of the Rune or Soul stone you are holding (like the Blank Card)</p>
#    <p>Drops a random rune on the floor when picked up</p>
#    <p>The recharge time of this item depends on the Rune/Soul Stone held:</p>
#    <p>1 room: Soul of Lazarus</p>
#    <p>2 rooms: Rune of Ansuz, Rune of Berkano, Rune of Hagalaz, Soul of Cain</p>
#    <p>3 rooms: Rune of Algiz, Blank Rune, Soul of Magdalene, Soul of Judas, Soul of ???, Soul of the Lost</p>
#    <p>4 rooms: Rune of Ehwaz, Rune of Perthro, Black Rune, Soul of Isaac, Soul of Eve, Soul of Eden, Soul of the Forgotten, Soul of Jacob and Esau</p>
#    <p>6 rooms: Rune of Dagaz, Soul of Samson, Soul of Azazel, Soul of Apollyon, Soul of Bethany</p>
#    <p>12 rooms: Rune of Jera, Soul of Lilith, Soul of the Keeper</p>

最新更新