nokogiri和xpath——使用数据集进行嵌套循环



我试图循环遍历每个元素中的元素,但下面的内部循环出现了问题。在我看来,xpath模式"*/td"没有返回任何结果。我希望看到打印到stdout的标记中的数据。我用的是野村。

我正在把这个粘贴到我的rails控制台:

require 'nokogiri'
f = File.open("public/index.html")
doc = Nokogiri::HTML(f)
f.close
doc.xpath('//*[@id="WhoIsOnDutyTableLevel4"]/tbody/tr').each do |row|
  puts "row= " + row.to_s
  row.xpath('*/td').each do |td|
    puts "td= " + td
  end
end

下面是控制台的输出:

row= <tr id="208894">
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6mdgIY4sPrzAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user1" id="user1" class="details">User 1</a></td>
<td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td>
<td headers="WhoIsOnDutyTableLevel1:header:3">0</td>
</tr>
row= <tr id="207792">
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6AOzsYzBi7dAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user2" id="user2" class="details">User 2</a></td>
<td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td>
<td headers="WhoIsOnDutyTableLevel1:header:3">5</td>
</tr>
=> 0

这是我正在解析的html:

<table class="duty-report-level1" id="WhoIsOnDutyTableLevel1">
<caption></caption>
<thead>
<tr>
<th id="WhoIsOnDutyTableLevel1:header:1" class="duty-report-lt-header">c</th>
</tr>
</thead>
<tfoot></tfoot>
<tbody>
<tr>
<td headers="WhoIsOnDutyTableLevel1:header:1">
<table class="duty-report-level2" id="WhoIsOnDutyTableLevel2">
<caption></caption>
<thead>
<tr>
<th id="WhoIsOnDutyTableLevel1:header:1">Group Name</th><th id="WhoIsOnDutyTableLevel1:header:2">Group Time Zone</th><th id="WhoIsOnDutyTableLevel1:header:3">Default Devices</th><th id="WhoIsOnDutyTableLevel1:header:4">Supervisors</th>
</tr>
</thead>
<tfoot></tfoot>
<tbody>
<tr>
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/GroupDetails.do;jsessionid=17gaw4aw5pv8s?_data=TJZuNquzHUgWcre8AVcKpAFRUsezgPKzbHn7hwtTf9Ei0C2PJ8QYcKIy8OkorCWT8HDTAzkon1ls%0D%0AefuHC1N%2F0SLQLY8nxBhwesdd7Zeg6NzvCfuzRqLg5g%3D%3D" name="team1" id="team1" class="details">Team 1</a></td><td headers="WhoIsOnDutyTableLevel1:header:2" class="centered-text">US/Pacific</td><td headers="WhoIsOnDutyTableLevel1:header:3" class="centered-text"><img src="/static/images/icon_boolean_false.png" alt="No" border="0"></td><td headers="WhoIsOnDutyTableLevel1:header:4">
<values>
</values><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z7AnuRhH67H6AixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="mgr1" id="mgr1" class="details">Mgr 1</a>
<br>




</td>
</tr>
<tr>
<td headers="WhoIsOnDutyTableLevel1:header:1" class="no-padding" colspan="4">
<table class="duty-report-level3" id="WhoIsOnDutyTableLevel3">
<caption></caption>
<thead>
<tr>
<th id="WhoIsOnDutyTableLevel1:header:1" class="th-left">a</th><th id="WhoIsOnDutyTableLevel1:header:2" class="">b</th>
</tr>
</thead>
<tfoot></tfoot>
<tbody>
<tr>
<td headers="WhoIsOnDutyTableLevel1:header:1" class="no-padding" colspan="2">
<table class="duty-report-level4" id="WhoIsOnDutyTableLevel4">
<caption></caption>
<thead>
<tr>
<th id="WhoIsOnDutyTableLevel1:header:1">Recipient</th><th id="WhoIsOnDutyTableLevel1:header:2">Category</th><th id="WhoIsOnDutyTableLevel1:header:3">Escalation</th>
</tr>
</thead>
<tfoot></tfoot>
<tbody>
<tr id="208894">
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6mdgIY4sPrzAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user1" id="user1" class="details">User 1</a></td><td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td><td headers="WhoIsOnDutyTableLevel1:header:3">0</td>
</tr>
<tr id="207792">
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6AOzsYzBi7dAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user2" id="user2" class="details">User 2</a></td><td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td><td headers="WhoIsOnDutyTableLevel1:header:3">5</td>
</tr>


</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>

您需要对XPath:进行一个小的更改

doc.xpath('//*[@id="WhoIsOnDutyTableLevel4"]/tbody/tr').each do |row|
  # puts "row= " + row.to_s
  row.xpath('./td').each do |td|
    puts "td= " + td.text
  end
end

哪个输出:

td=用户1td=人员td=0td=用户2td=人员td=5

使用./td作为td的XPath基本上意味着"从这一点往下看"。

就我个人而言,除非您绝对需要XPath,否则我建议您使用CSS访问器。它们可读性更强,而且通常更简单:

doc.search('#WhoIsOnDutyTableLevel4 tbody tr').each do |row|
  row.search('td').each do |td|
    puts "td= " + td.text
  end
end

我建议使用search代替cssxpath,使用at代替at_cssat_xpath。当你选择其中一种而不是另一种时,没有真正的魔力,你只需要记住两种不同的方法。

内部循环中的XPath表达式是相对于每个tr计算的,因此您希望使用td(它选择上下文tr子元素td),而不是*/td(它选择孙元素td)。

完整代码:

doc.xpath('//*[@id="WhoIsOnDutyTableLevel4"]/tbody/tr').each do |row|
    puts "row= " + row.to_s
    row.xpath('td').each do |td|
        puts "td= " + td
    end
end

相关内容

  • 没有找到相关文章

最新更新