我认为最好的解释方式是通过一些代码。基本上,识别表中需要的TR(我已经到达表本身并将其命名为annual_income_statement)的唯一方法是通过TR中第一个TD的文本,如下所示:
知道这个可能也很有帮助:
实际html:
doc = Nokogiri::HTML(open('https://www.google.com/finance?q=NYSE%3AAA&fstype=iii'))
html片段:<div id="incannualdiv">
<table id="fs-table">
<tbody>
<tr>..</tr>
...
<tr>
<td>Net Income</td>
<td>100</td>
</tr>
<tr>..</tr>
</tbody>
</table>
</div>
原始xpath irb(main):161:0> annual_income_statement = doc.xpath("//div[@id='incannualdiv']/table[@id='fs-table']/tbody")
<代码>代码>
irb(main):121:0> a = nil
=> nil
irb(main):122:0> annual_income_statement.children.each { |e| if e.text.include? "Net Income" and e.text.exclude? "Ex"
irb(main):123:2> a = e.text
irb(main):124:2> end }
=> 0
irb(main):125:0> a
=> "Net Incomenn191.00n611.00n254.00n-1,151.00n"
irb(main):127:0> a.split "n"
=> ["Net Income", "", "191.00", "611.00", "254.00", "-1,151.00"]
但是有更好的方法吗?
更多细节:
doc = Nokogiri::HTML(open('https://www.google.com/finance?q=NYSE%3AAA&fstype=iii'))
div = doc.at "div[@id='incannualdiv']" #div containing the table i want
table = div.at 'table' #table containing tbody i want
tbody = table.at 'tbody' #tbody containing tr's I want
trs = tbody.at 'tr' #SHOULD be all tr's of that table/tbody - but it's only the first TR?
我希望最后一点能给我所有的TR(包括我正在寻找的TD)但实际上它只给了我第一个TR
最好的可能是:
table.at 'tr:has(td[1][text()="Net Income"])'
编辑
更多信息:
doc = Nokogiri::HTML <<EOF
<div id="incannualdiv">
<table id="fs-table">
<tbody>
<tr>..</tr>
...
<tr>
<td>Net Income</td>
<td>100</td>
</tr>
<tr>..</tr>
</tbody>
</table>
</div>
EOF
table = doc.at 'table'
table.at('tr:has(td[1][text()="Net Income"])').to_s
#=> "<tr>n<td>Net Income</td>n <td>100</td>n </tr>n"