我有一些从KML文件转换为XML的数据,我很好奇如何使用PHP或Ruby来获取诸如邻居名称和坐标之类的东西。我知道它们周围有这样的标签。
<cities>
<neighborhood>Gotham</neighborhood>
</cities>
,但不幸的是数据被格式化为:
<SimpleData name="neighborhd">Colgate Center</SimpleData>
不是<neighborhd>Colgate Center</neighborhd>
这是KML源代码:
我如何使用PHP或Ruby从这样的东西拉数据?我安装了一些Ruby宝石来解析XML数据,但XML只是我没有太多工作的东西。
您的XML无效,但Nokogiri将尝试修复它。
下面是如何检查无效的XML/XHTML/HTML以及如何重写所需的部分。
设置如下:
require 'nokogiri'
doc = Nokogiri.XML(<<EOT)
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://earth.google.com/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
<Schema name="Sample_Neighborhoods_Samples" id="Sample_Neighborhoods_Samples">
<SimpleField type="int" name="nid"/>
<SimpleField type="string" name="neighborhd"/>
<SimpleField type="string" name="place"/>
<SimpleField type="string" name="placecode"/>
<SimpleField type="string" name="nbr_type"/>
<SimpleField type="string" name="po_name"/>
<SimpleField type="string" name="metro"/>
<SimpleField type="string" name="country"/>
<SimpleField type="string" name="state"/>
<SimpleField type="string" name="statefips"/>
<SimpleField type="string" name="county"/>
<SimpleField type="string" name="countyfips"/>
<SimpleField type="string" name="mcd"/>
<SimpleField type="string" name="mcdfips"/>
<SimpleField type="string" name="cbsa"/>
<SimpleField type="string" name="cbsacode"/>
<SimpleField type="string" name="cbsatype"/>
<SimpleField type="double" name="cenlat"/>
<SimpleField type="double" name="cenlon"/>
<SimpleField type="int" name="color"/>
<SimpleField type="string" name="ncs_code"/>
<SimpleField type="string" name="release"/>
</Schema>
<Style id="KMLSTYLER_6">
<LabelStyle>
<scale>1.0</scale>
</LabelStyle>
<LineStyle>
<colorMode>normal</colorMode>
</LineStyle>
<PolyStyle>
<color>7f4080ff</color>
<colorMode>random</colorMode>
</PolyStyle>
</Style>
<name>Sample_Neighborhoods_NYC</name>
<visibility>1</visibility>
<Folder id="kml_ft_Sample_Neighborhoods_Samples">
<name>Sample_Neighborhoods_Samples</name>
<Folder id="kml_ft_Sample_Neighborhoods_Samples_Sample_Neighborhoods_NYC">
<name>Sample_Neighborhoods_NYC</name>
<Placemark id="kml_1">
<name>Colgate Center</name>
<Snippet> </Snippet>
<styleUrl>#KMLSTYLER_6</styleUrl>
<ExtendedData>
<SchemaData schemaUrl="#Sample_Neighborhoods_Samples">
<SimpleData name="nid">7086</SimpleData>
<SimpleData name="neighborhd">Colgate Center</SimpleData>
<SimpleData name="place">Jersey City</SimpleData>
<SimpleData name="placecode">36000</SimpleData>
<SimpleData name="nbr_type">S</SimpleData>
<SimpleData name="po_name">JERSEY CITY</SimpleData>
<SimpleData name="metro">New York City, NY</SimpleData>
<SimpleData name="country">USA</SimpleData>
<SimpleData name="state">NJ</SimpleData>
<SimpleData name="statefips">34</SimpleData>
<SimpleData name="county">Hudson</SimpleData>
<SimpleData name="countyfips">34017</SimpleData>
<SimpleData name="mcd">Jersey City</SimpleData>
<SimpleData name="mcdfips">36000</SimpleData>
<SimpleData name="cbsa">New York-Northern New Jersey-Long Island, NY-NJ-PA</SimpleData>
<SimpleData name="cbsacode">35620</SimpleData>
<SimpleData name="cbsatype">Metro</SimpleData>
<SimpleData name="cenlat">40.7145135000001</SimpleData>
<SimpleData name="cenlon">-74.0343385</SimpleData>
<SimpleData name="color">1</SimpleData>
<SimpleData name="ncs_code">40910000</SimpleData>
<SimpleData name="release">1.12.2</SimpleData>
</SchemaData>
</ExtendedData>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>-74.036628,40.712211,0 -74.0357779999999,40.7120810000001,0 -74.035535,40.7122010000001,0 -74.0348299999999,40.71209,0 -74.034903,40.711804,0 -74.033761,40.7116560000001,0 -74.0334089999999,40.7121090000001,0 -74.032996,40.7141330000001,0 -74.0331899999999,40.7141790000001,0 -74.032656,40.7162500000001,0 -74.032231,40.716194,0 -74.032049,40.716908,0 -74.033871,40.7170370000001,0 -74.035629,40.7173710000001,0 -74.035669,40.7171650000001,0 -74.036009,40.715335,0 -74.036325,40.713625,0 -74.036482,40.7123580000001,0 -74.036628,40.712211,0 </coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</Placemark>
<Placemark id="kml_2">
<name>Colgate Center</name>
<Snippet> </Snippet>
<ExtendedData>
EOT
下面是查看是否有错误的方法。只要errors
不是空的,你就有问题。
puts doc.errors
这是在整个文档中查找SimpleData
节点的一种方法。出于可读性的考虑,我更喜欢使用CSS访问器而不是XPath。有时XPath更好,因为它在搜索时提供了更好的粒度。你需要两样都学。
doc.search('ExtendedData SimpleData').each do |simple_data|
node_name = simple_data['name']
puts "<%s>%s</%s>" % [node_name, simple_data.text.strip, node_name]
end
下面是运行后的输出:
Premature end of data in tag ExtendedData line 87
Premature end of data in tag Placemark line 84
Premature end of data in tag Folder line 44
Premature end of data in tag Folder line 42
Premature end of data in tag Document line 3
Premature end of data in tag kml line 2
<nid>7086</nid>
<neighborhd>Colgate Center</neighborhd>
<place>Jersey City</place>
<placecode>36000</placecode>
<nbr_type>S</nbr_type>
<po_name>JERSEY CITY</po_name>
<metro>New York City, NY</metro>
<country>USA</country>
<state>NJ</state>
<statefips>34</statefips>
<county>Hudson</county>
<countyfips>34017</countyfips>
<mcd>Jersey City</mcd>
<mcdfips>36000</mcdfips>
<cbsa>New York-Northern New Jersey-Long Island, NY-NJ-PA</cbsa>
<cbsacode>35620</cbsacode>
<cbsatype>Metro</cbsatype>
<cenlat>40.7145135000001</cenlat>
<cenlon>-74.0343385</cenlon>
<color>1</color>
<ncs_code>40910000</ncs_code>
<release>1.12.2</release>
我不想修改DOM,但这很容易做到:
doc.search('ExtendedData SimpleData').each do |simple_data|
node_name = simple_data['name']
simple_data.replace("<%s>%s</%s>" % [node_name, simple_data.text.strip, node_name])
end
puts doc.to_xml
运行后,以下是受影响的部分:
<ExtendedData>
<SchemaData schemaUrl="#Sample_Neighborhoods_Samples">
<nid>7086</nid>
<neighborhd>Colgate Center</neighborhd>
<place>Jersey City</place>
<placecode>36000</placecode>
<nbr_type>S</nbr_type>
<po_name>JERSEY CITY</po_name>
<metro>New York City, NY</metro>
<country>USA</country>
<state>NJ</state>
<statefips>34</statefips>
<county>Hudson</county>
<countyfips>34017</countyfips>
<mcd>Jersey City</mcd>
<mcdfips>36000</mcdfips>
<cbsa>New York-Northern New Jersey-Long Island, NY-NJ-PA</cbsa>
<cbsacode>35620</cbsacode>
<cbsatype>Metro</cbsatype>
<cenlat>40.7145135000001</cenlat>
<cenlon>-74.0343385</cenlon>
<color>1</color>
<ncs_code>40910000</ncs_code>
<release>1.12.2</release>
</SchemaData>
</ExtendedData>