我试图循环遍历每个元素中的元素,但下面的内部循环出现了问题。在我看来,xpath模式"*/td"没有返回任何结果。我希望看到打印到stdout的标记中的数据。我用的是野村。
我正在把这个粘贴到我的rails控制台:
require 'nokogiri'
f = File.open("public/index.html")
doc = Nokogiri::HTML(f)
f.close
doc.xpath('//*[@id="WhoIsOnDutyTableLevel4"]/tbody/tr').each do |row|
puts "row= " + row.to_s
row.xpath('*/td').each do |td|
puts "td= " + td
end
end
下面是控制台的输出:
row= <tr id="208894">
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6mdgIY4sPrzAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user1" id="user1" class="details">User 1</a></td>
<td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td>
<td headers="WhoIsOnDutyTableLevel1:header:3">0</td>
</tr>
row= <tr id="207792">
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6AOzsYzBi7dAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user2" id="user2" class="details">User 2</a></td>
<td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td>
<td headers="WhoIsOnDutyTableLevel1:header:3">5</td>
</tr>
=> 0
这是我正在解析的html:
<table class="duty-report-level1" id="WhoIsOnDutyTableLevel1">
<caption></caption>
<thead>
<tr>
<th id="WhoIsOnDutyTableLevel1:header:1" class="duty-report-lt-header">c</th>
</tr>
</thead>
<tfoot></tfoot>
<tbody>
<tr>
<td headers="WhoIsOnDutyTableLevel1:header:1">
<table class="duty-report-level2" id="WhoIsOnDutyTableLevel2">
<caption></caption>
<thead>
<tr>
<th id="WhoIsOnDutyTableLevel1:header:1">Group Name</th><th id="WhoIsOnDutyTableLevel1:header:2">Group Time Zone</th><th id="WhoIsOnDutyTableLevel1:header:3">Default Devices</th><th id="WhoIsOnDutyTableLevel1:header:4">Supervisors</th>
</tr>
</thead>
<tfoot></tfoot>
<tbody>
<tr>
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/GroupDetails.do;jsessionid=17gaw4aw5pv8s?_data=TJZuNquzHUgWcre8AVcKpAFRUsezgPKzbHn7hwtTf9Ei0C2PJ8QYcKIy8OkorCWT8HDTAzkon1ls%0D%0AefuHC1N%2F0SLQLY8nxBhwesdd7Zeg6NzvCfuzRqLg5g%3D%3D" name="team1" id="team1" class="details">Team 1</a></td><td headers="WhoIsOnDutyTableLevel1:header:2" class="centered-text">US/Pacific</td><td headers="WhoIsOnDutyTableLevel1:header:3" class="centered-text"><img src="/static/images/icon_boolean_false.png" alt="No" border="0"></td><td headers="WhoIsOnDutyTableLevel1:header:4">
<values>
</values><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z7AnuRhH67H6AixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="mgr1" id="mgr1" class="details">Mgr 1</a>
<br>
</td>
</tr>
<tr>
<td headers="WhoIsOnDutyTableLevel1:header:1" class="no-padding" colspan="4">
<table class="duty-report-level3" id="WhoIsOnDutyTableLevel3">
<caption></caption>
<thead>
<tr>
<th id="WhoIsOnDutyTableLevel1:header:1" class="th-left">a</th><th id="WhoIsOnDutyTableLevel1:header:2" class="">b</th>
</tr>
</thead>
<tfoot></tfoot>
<tbody>
<tr>
<td headers="WhoIsOnDutyTableLevel1:header:1" class="no-padding" colspan="2">
<table class="duty-report-level4" id="WhoIsOnDutyTableLevel4">
<caption></caption>
<thead>
<tr>
<th id="WhoIsOnDutyTableLevel1:header:1">Recipient</th><th id="WhoIsOnDutyTableLevel1:header:2">Category</th><th id="WhoIsOnDutyTableLevel1:header:3">Escalation</th>
</tr>
</thead>
<tfoot></tfoot>
<tbody>
<tr id="208894">
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6mdgIY4sPrzAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user1" id="user1" class="details">User 1</a></td><td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td><td headers="WhoIsOnDutyTableLevel1:header:3">0</td>
</tr>
<tr id="207792">
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6AOzsYzBi7dAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user2" id="user2" class="details">User 2</a></td><td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td><td headers="WhoIsOnDutyTableLevel1:header:3">5</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
您需要对XPath:进行一个小的更改
doc.xpath('//*[@id="WhoIsOnDutyTableLevel4"]/tbody/tr').each do |row|
# puts "row= " + row.to_s
row.xpath('./td').each do |td|
puts "td= " + td.text
end
end
哪个输出:
td=用户1td=人员td=0td=用户2td=人员td=5
使用./td
作为td
的XPath基本上意味着"从这一点往下看"。
就我个人而言,除非您绝对需要XPath,否则我建议您使用CSS访问器。它们可读性更强,而且通常更简单:
doc.search('#WhoIsOnDutyTableLevel4 tbody tr').each do |row|
row.search('td').each do |td|
puts "td= " + td.text
end
end
我建议使用search
代替css
或xpath
,使用at
代替at_css
或at_xpath
。当你选择其中一种而不是另一种时,没有真正的魔力,你只需要记住两种不同的方法。
内部循环中的XPath表达式是相对于每个tr
计算的,因此您希望使用td
(它选择上下文tr
的子元素td
),而不是*/td
(它选择孙元素td
)。
完整代码:
doc.xpath('//*[@id="WhoIsOnDutyTableLevel4"]/tbody/tr').each do |row|
puts "row= " + row.to_s
row.xpath('td').each do |td|
puts "td= " + td
end
end