基于此HTML:
<li><strong><a href="http://www.ukasta.org.uk/">United Kingdom Agricultural Supply Trade Association</a> (UKASTA)</strong></li>
我想得到United Kingdom Agricultural Supply TradeAssociation
和(UKASTA)
的字符串。
我用Nokogiri写:
linklist=link.parent.parent.css('li strong a')
linklist.each do |f|
puts f.text
end
f.text
是"英国农业供给贸易协会",但是我怎么得到"(UKASTA)"呢?
你陷得太深了。我使用:
require 'nokogiri'
html = '<li><strong><a href="http://www.ukasta.org.uk/">United Kingdom Agricultural Supply Trade Association</a> (UKASTA)</strong></li>'
doc = Nokogiri::HTML(html)
doc.at('strong').text
返回:
"United Kingdom Agricultural Supply Trade Association (UKASTA)"
如果您必须找到<a>
节点,您可以使用:
a_node = doc.at('a')
a_node.text
=> "United Kingdom Agricultural Supply Trade Association"
a_node.next_sibling.text
=> " (UKASTA)"
您可以使用children
方法,然后根据位置识别数据:
require 'nokogiri'
html_doc = Nokogiri::HTML("<html><li><strong><a href="">United Kingdom Agricultural Supply Trade Association</a>(UKASTA)</strong></li></html>")
html_doc.css('li strong').children[0].text
=> United Kingdom Agricultural Supply Trade Association
html_doc.css('li strong').children[1]
=> (UKASTA)