我知道以前有人问过这个问题的版本,但我在这个版本中遇到了一个特定的问题。
我正试图从嵌入CDATA但不在xml标记中的RSS提要中提取一些文本。这是RSS文件:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="/rss/ndbcrss.xsl"?>
<rss version="2.0" xmlns:georss="http://www.georss.org/georss" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
<title>NDBC - Station 46042 - MONTEREY - 27NM WNW of Monterey, CA Observations</title>
<description><![CDATA[This feed shows recent marine weather observations from Station 46042.]]></description>
<link>http://www.ndbc.noaa.gov/</link>
<pubDate>Wed, 07 Aug 2013 21:06:45 UT</pubDate>
<lastBuildDate>Wed, 07 Aug 2013 21:06:45 UT</lastBuildDate>
<ttl>30</ttl>
<language>en-us</language>
<managingEditor>webmaster.ndbc@noaa.gov</managingEditor>
<webMaster>webmaster.ndbc@noaa.gov</webMaster>
<image>
<url>http://weather.gov/images/xml_logo.gif</url>
<title>NOAA - National Weather Service</title>
<link>http://www.ndbc.noaa.gov/</link>
</image>
<item>
<pubDate>Wed, 07 Aug 2013 21:06:45 UT</pubDate>
<title>Station 46042 - MONTEREY - 27NM WNW of Monterey, CA</title>
<description><![CDATA[
<strong>August 7, 2013 1:50 pm PDT</strong><br />
<strong>Location:</strong> 36.785N 122.469W<br />
<strong>Wind Direction:</strong> SW (220°)<br />
<strong>Wind Speed:</strong> 1.9 knots<br />
<strong>Wind Gust:</strong> 1.9 knots<br />
<strong>Significant Wave Height:</strong> 2.3 ft<br />
<strong>Dominant Wave Period:</strong> 14 sec<br />
<strong>Average Period:</strong> 6.9 sec<br />
<strong>Mean Wave Direction:</strong> SSE (160°) <br />
<strong>Atmospheric Pressure:</strong> 30.11 in (1019.5 mb)<br />
<strong>Pressure Tendency:</strong> -0.01 in (-0.3 mb)<br />
<strong>Air Temperature:</strong> 60.8°F (16.0°C)<br />
<strong>Water Temperature:</strong> 59.9°F (15.5°C)<br />
]]></description>
<link>http://www.ndbc.noaa.gov/station_page.php?station=46042</link>
<guid>http://www.ndbc.noaa.gov/station_page.php?station=46042&ts=1375908600</guid>
<georss:point>36.785 -122.469</georss:point>
</item>
</channel>
</rss>
我试图从下面的行中获得"2.3英尺"、"14秒"one_answers"SSE(160°;)":
<strong>Significant Wave Height:</strong> 2.3 ft<br />
<strong>Dominant Wave Period:</strong> 14 sec<br />
<strong>Mean Wave Direction:</strong> SSE (160°) <br />
我可以去掉CDATA,然后访问强[x]元素,但我不知道如何获得标签之外的上述文本。
编辑
谢谢你,卡尔!使用explore/regex效果非常好。另一个工具添加到我的小(但正在增长)包中。
这是我用来存储三项的工作代码:
<?php
$url = "http://www.ndbc.noaa.gov/data/latest_obs/46042.rss";
$xml = simplexml_load_file($url);
$data = $xml->channel->item->description;
foreach (explode("n", $data) as $key=>$line) {
preg_match('/(<strong>.+?</strong>)(.*)?<br/', $line, $matches);
if ( ! empty($matches)) {
$dataDescr[$key] = $matches[1];
$dataVal[$key] = $matches[2];
}
}
$sigWavHt = $dataVal[5];
$domWavPer = $dataVal[6];
$meanWavDir = $dataVal[8];
echo "$sigWavHt, $domWavPer, $meanWavDir"; //to test results
?>
如果您确定数据与您的示例一致,您可以使用正则表达式来提取数据。
例如:
$data = "<strong>Significant Wave Height:</strong> 2.3 ft<br />
<strong>Dominant Wave Period:</strong> 14 sec<br />
<strong>Mean Wave Direction:</strong> SSE (160°) <br />";
foreach (explode("n", $data) as $line) {
preg_match('/(<strong>.+?</strong>)(.*)?<br/', $line, $matches);
if ( ! empty($matches)) {
// The part with the <strong> tags is now in $matches[1], and
// the part after is in $matches[2]
echo "Key: {$matches[1]}tValue: {$matches[2]}n";
}
}
在查看您在上面发布的完整提要时,您需要记住,第一个日期行在<strong>
内容之后没有"数据"部分。。。