使用PHP解析标记外的RSS CDATA文本



我知道以前有人问过这个问题的版本,但我在这个版本中遇到了一个特定的问题。

我正试图从嵌入CDATA但不在xml标记中的RSS提要中提取一些文本。这是RSS文件:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="/rss/ndbcrss.xsl"?>
<rss version="2.0" xmlns:georss="http://www.georss.org/georss" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>NDBC - Station 46042 - MONTEREY - 27NM WNW of Monterey, CA Observations</title>
    <description><![CDATA[This feed shows recent marine weather observations from Station 46042.]]></description>
    <link>http://www.ndbc.noaa.gov/</link>
    <pubDate>Wed, 07 Aug 2013 21:06:45 UT</pubDate>
    <lastBuildDate>Wed, 07 Aug 2013 21:06:45 UT</lastBuildDate>
    <ttl>30</ttl>
    <language>en-us</language>
    <managingEditor>webmaster.ndbc@noaa.gov</managingEditor>
    <webMaster>webmaster.ndbc@noaa.gov</webMaster>
    <image>
      <url>http://weather.gov/images/xml_logo.gif</url>
      <title>NOAA - National Weather Service</title>
      <link>http://www.ndbc.noaa.gov/</link>
    </image>
    <item>
      <pubDate>Wed, 07 Aug 2013 21:06:45 UT</pubDate>
      <title>Station 46042 - MONTEREY - 27NM WNW of Monterey, CA</title>
      <description><![CDATA[
        <strong>August 7, 2013 1:50 pm PDT</strong><br />
        <strong>Location:</strong> 36.785N 122.469W<br />
        <strong>Wind Direction:</strong> SW (220&#176;)<br />
        <strong>Wind Speed:</strong> 1.9 knots<br />
        <strong>Wind Gust:</strong> 1.9 knots<br />
        <strong>Significant Wave Height:</strong> 2.3 ft<br />
        <strong>Dominant Wave Period:</strong> 14 sec<br />
        <strong>Average Period:</strong> 6.9 sec<br />
        <strong>Mean Wave Direction:</strong> SSE (160&#176;) <br />
        <strong>Atmospheric Pressure:</strong> 30.11 in (1019.5 mb)<br />
        <strong>Pressure Tendency:</strong> -0.01 in (-0.3 mb)<br />
        <strong>Air Temperature:</strong> 60.8&#176;F (16.0&#176;C)<br />
        <strong>Water Temperature:</strong> 59.9&#176;F (15.5&#176;C)<br />
      ]]></description>
      <link>http://www.ndbc.noaa.gov/station_page.php?station=46042</link>
      <guid>http://www.ndbc.noaa.gov/station_page.php?station=46042&amp;ts=1375908600</guid>
      <georss:point>36.785 -122.469</georss:point>
    </item>
  </channel>
</rss>

我试图从下面的行中获得"2.3英尺"、"14秒"one_answers"SSE(160°;)":

<strong>Significant Wave Height:</strong> 2.3 ft<br />
<strong>Dominant Wave Period:</strong> 14 sec<br />
<strong>Mean Wave Direction:</strong> SSE (160&#176;) <br />

我可以去掉CDATA,然后访问强[x]元素,但我不知道如何获得标签之外的上述文本。

编辑

谢谢你,卡尔!使用explore/regex效果非常好。另一个工具添加到我的小(但正在增长)包中。

这是我用来存储三项的工作代码:

<?php
$url = "http://www.ndbc.noaa.gov/data/latest_obs/46042.rss";    
$xml = simplexml_load_file($url);
$data = $xml->channel->item->description;

foreach (explode("n", $data) as $key=>$line) {
    preg_match('/(<strong>.+?</strong>)(.*)?<br/', $line, $matches);
    if ( ! empty($matches)) { 
        $dataDescr[$key] = $matches[1];
        $dataVal[$key] = $matches[2];
    }
}   
$sigWavHt = $dataVal[5];
$domWavPer = $dataVal[6];
$meanWavDir = $dataVal[8];
echo "$sigWavHt, $domWavPer, $meanWavDir"; //to test results
?>

如果您确定数据与您的示例一致,您可以使用正则表达式来提取数据。

例如:

$data = "<strong>Significant Wave Height:</strong> 2.3 ft<br />
<strong>Dominant Wave Period:</strong> 14 sec<br />
<strong>Mean Wave Direction:</strong> SSE (160&#176;) <br />";
foreach (explode("n", $data) as $line) {
    preg_match('/(<strong>.+?</strong>)(.*)?<br/', $line, $matches);
    if ( ! empty($matches)) {
        // The part with the <strong> tags is now in $matches[1], and
        // the part after is in $matches[2]
        echo "Key: {$matches[1]}tValue: {$matches[2]}n"; 
    }
}

在查看您在上面发布的完整提要时,您需要记住,第一个日期行在<strong>内容之后没有"数据"部分。。。

最新更新