使用Nokogiri读取大型XML文件



我在用Nokogiri读取一个(有点)大的XML文件时遇到了问题,但无法找出哪里出了问题。文件内容如下(为了可读性,只包含一个节点):

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:g="http://base.google.com/ns/1.0"><title>End | Globally Sourced Menswear</title><link type="self" link="http://www.endclothing.com/eu/"/><updated>2016-04-02T01:25:30+00:00</updated><entry><g:id>391</g:id><g:mpn>WRY3924TY</g:mpn><g:color>N/A</g:color><g:title>Comme des Garcons x Artek Standard Eau De Toilette</g:title><g:link>http://www.endclothing.com/eu/comme-des-garcons-x-artek-standard-eau-de-toilette-wry3924ty.html</g:link><g:price>89.00 EUR</g:price><g:availability>in stock</g:availability><g:brand>CDG Parfum</g:brand><g:custom_label_0>Perfume &amp; Fragrance</g:custom_label_0><g:condition>new</g:condition><g:description><![CDATA[<p>Founded by 4 young idealists in 1935, Finnish design company Artek produce modern furniture to promote the modern culture of habitation. Here they collaborate with <a href="/brands/comme-des-garcons-parfum">Comme des Garçons Parfum</a> to produce a fragrance dubbed 'Standard', ironic for a scent that is anything but.</p>
<span style="font-style:italic;">Notes Include:</span>
<ul>
<li>Thyme</li>
<li>Black Pepper</li>
<li>Patchouli</li>
<li>Cedar Wood</li>
<li>Citrus</li>
</ul>
<p>Due to recent changes in regulations, we are unable to ship aftershaves and perfumes to certain destinations outside of the EU. For full details, please email <a href="mailto:info@endclothing.co.uk?subject=Aftershave and Perfume Shipment">info@endclothing.co.uk</a> or call +44 191 231 3983.</p>]]></g:description><g:image_link>http://media.endclothing.com/media/catalog/product/1/8/18-03-2016_commedesgarcons_xartekstandardeaudetoilette_100ml_sh_1.jpg</g:image_link><g:additional_image_link>http://media.endclothing.com/media/catalog/product/1/8/18-03-2016_commedesgarcons_xartekstandardeaudetoilette_100ml_sh_2.jpg</g:additional_image_link><g:shipping><g:country>FR</g:country><g:service>DPD Priority Service</g:service><g:price>9.00 EUR</g:price></g:shipping><g:shipping><g:country>DE</g:country><g:service>DPD Priority Service</g:service><g:price>9.00 EUR</g:price></g:shipping><g:shipping><g:country>DK</g:country><g:service>DPD Priority Service</g:service><g:price>9.00 EUR</g:price></g:shipping><g:shipping><g:country>NL</g:country><g:service>DPD Priority Service</g:service><g:price>9.00 EUR</g:price></g:shipping><g:shipping><g:country>IT</g:country><g:service>DPD Priority Service</g:service><g:price>9.00 EUR</g:price></g:shipping><g:shipping><g:country>SE</g:country><g:service>DPD Priority Service</g:service><g:price>9.00 EUR</g:price></g:shipping><g:shipping><g:country>BE</g:country><g:service>DPD Priority Service</g:service><g:price>9.00 EUR</g:price></g:shipping><g:shipping><g:country>AT</g:country><g:service>DPD Priority Service</g:service><g:price>15.00 EUR</g:price></g:shipping><g:shipping><g:country>IE</g:country><g:service>Parcel Force Priority Service</g:service><g:price>15.00 EUR</g:price></g:shipping><g:shipping><g:country>ES</g:country><g:service>DPD Priority Service</g:service><g:price>15.00 EUR</g:price></g:shipping><g:shipping><g:country>LV</g:country><g:service>DPD Priority Service</g:service><g:price>19.00 EUR</g:price></g:shipping><g:shipping><g:country>HR</g:country><g:service>DPD Priority Service</g:service><g:price>35.00 EUR</g:price></g:shipping><g:shipping><g:country>CY</g:country><g:service>FEDEX Priority Service</g:service><g:price>45.00 EUR</g:price></g:shipping><g:shipping><g:country>HU</g:country><g:service>DPD Priority Service</g:service><g:price>15.00 EUR</g:price></g:shipping><g:shipping><g:country>PT</g:country><g:service>DPD Priority Service</g:service><g:price>19.00 EUR</g:price></g:shipping><g:shipping><g:country>EE</g:country><g:service>DPD Priority Service</g:service><g:price>25.00 EUR</g:price></g:shipping><g:shipping><g:country>LU</g:country><g:service>DPD Priority Service</g:service><g:price>9.00 EUR</g:price></g:shipping><g:shipping><g:country>SK</g:country><g:service>DPD Priority Service</g:service><g:price>15.00 EUR</g:price></g:shipping><g:shipping><g:country>BG</g:country><g:service>DPD Priority Service</g:service><g:price>25.00 EUR</g:price></g:shipping><g:shipping><g:country>GR</g:country><g:service>FEDEX Priority Service</g:service><g:price>25.00 EUR</g:price></g:shipping><g:shipping><g:country>PL</g:country><g:service>DPD Priority Service</g:service><g:price>15.00 EUR</g:price></g:shipping><g:shipping><g:country>LT</g:country><g:service>DPD Priority Service</g:service><g:price>19.00 EUR</g:price></g:shipping><g:shipping><g:country>SI</g:country><g:service>DPD Priority Service</g:service><g:price>15.00 EUR</g:price></g:shipping><g:shipping><g:country>FI</g:country><g:service>Parcel Force Priority Service</g:service><g:price>19.00 EUR</g:price></g:shipping><g:shipping><g:country>CZ</g:country><g:service>DPD Priority Service</g:service><g:price>15.00 EUR</g:price></g:shipping><g:shipping><g:country>LI</g:country><g:service>FEDEX Priority Service</g:service><g:price>35.00 EUR</g:price></g:shipping><g:shipping><g:country>MC</g:country><g:service>DPD Priority Service</g:service><g:price>15.00 EUR</g:price></g:shipping><g:shipping><g:country>CH</g:country><g:service>Parcel Force Priority Service</g:service><g:price>15.00 EUR</g:price></g:shipping></entry></feed>

我已经尝试了以下代码来读取流,虽然单独的部分似乎工作得很好(data输出的字符串对我来说似乎是有效的XML),但Nokogiri似乎无法读取该字符串,只是崩溃或不返回我的xpath查询。

url = "http://www.endclothing.com/media/end_feeds/eu.xml.gz"
stream = open(url, 'Accept-Encoding' => 'gzip')
data = Zlib::GzipReader.new(stream).read
page = Nokogiri::XML(data)
page.xpath("//entry")
=> []

XML具有在根元素级别声明的默认命名空间:

xmlns="http://www.w3.org/2005/Atom"

在XML中,没有前缀的子代元素隐式地继承了祖先的默认命名空间。也就是说,您尝试获取的entry元素位于根元素的默认名称空间中。

另一方面,在XPath中,不带前缀的元素总是在空的命名空间中被考虑。要使用XPath引用XML默认名称空间中的元素,我们需要将前缀映射到默认名称空间URI,并在XPath中使用该前缀,例如:

page.xpath("//d:entry", 'd' => 'http://www.w3.org/2005/Atom')

相关内容

  • 没有找到相关文章

最新更新