PHP simpleXML如何检查是否存在嵌套子节点



我必须处理大约750个xml文件来生成融洽关系。我可能应该使用XSLT或XPath,但现在可能太晚了。我的问题是;对于前几张唱片来说,一切都很好。似乎有几个XML文件没有我调用的节点。我试过使用isset!== null,这不起作用,只是给了我同样的错误。即

注意:试图获得非对象的属性在/var/www/overzicht/script.php第38行
注意:试图获得非对象的属性在/var/www/overzicht/script.php第38行
致命错误:在/var/www/overzicht/script.php的非对象上调用成员函数children()

使用下列语句可能是错误的,对吗?

 if($xml_entry->children('http://www.isotc211.org/2005/gmd')->identificationInfo->MD_DataIdentification->citation->CI_Citation->title->children('http://www.isotc211.org/2005/gco'))

我试图解析的XML文件的一个小样本是(整个XML可以在这里找到:

 <gmd:contact>
    <gmd:CI_ResponsibleParty>
      <gmd:individualName>
        <gco:CharacterString>B. Boers</gco:CharacterString>
      </gmd:individualName>
      <gmd:organisationName>
        <gco:CharacterString>Staatsbosbeheer</gco:CharacterString>
      </gmd:organisationName>
      <gmd:positionName>
        <gco:CharacterString>Contactpersoon</gco:CharacterString>
      </gmd:positionName>
    </gmd:CI_ResponsibleParty>
</gmd:contact>

And my PHP:

<?php
        $xml_url = "http://www.nationaalgeoregister.nl/geonetwork/srv/dut/q?fast=index&from=1&to=10000&geometry=POLYGON((5.5963%2053.3162%2C5.5963%2053.5766%2C6.9612%2053.5766%2C6.9612%2053.3162%2C5.5963%2053.3162))";
        $xml_single_url = "http://www.nationaalgeoregister.nl/geonetwork/srv/dut/xml.metadata.get?uuid=";
        //Load the XML
        $xml = simplexml_load_file($xml_url);
        $xml_array = array();
        //Loop through all the nodes with 'metadata' and put uuid in the array
        foreach($xml->metadata as $metadata) {
                $xml_array[] = $metadata->children('http://www.fao.org/geonetwork')->children()->uuid;
        }       
        echo "<table>"
        ."<tr>"
        ."<td>Title</td>"
        ."<td>Owner</td>"
        ."<td>Purpose</td>"
        ."<td>Tags</td>"
        ."<td>Url</td>"
        ."<td>Url</td>"     
        ."</tr>";
        $i = 0;
        //For every id in the $xml_array 
        foreach($xml_array as $ar)
        {
            //Just a limit for testing purposes
            $i++;
            if($i == 100)
            {
                break;
            }
            //Loads the xml file
            $xml_entry = simplexml_load_file($xml_single_url .$ar);
            echo "<tr>";
            //Title
            echo "<td>"
            .$xml_entry->children('http://www.isotc211.org/2005/gmd')->identificationInfo->MD_DataIdentification->citation->CI_Citation->title->children('http://www.isotc211.org/2005/gco')->CharacterString
            ."</td>";
            //Owner
            echo "<td>" 
            .$xml_entry->children('http://www.isotc211.org/2005/gmd')->contact->CI_ResponsibleParty->organisationName->children('http://www.isotc211.org/2005/gco')->CharacterString
            ."</td>";
            //Purpose
            echo "<td>" 
            .$xml_entry->children('http://www.isotc211.org/2005/gmd')->identificationInfo->MD_DataIdentification->purpose->children('http://www.isotc211.org/2005/gco')->CharacterString
            ."</td>";
            //Tags      
            //Transfer          
            echo "</tr>";
        }       
        echo "</table>";
?>

我试着自己找解决办法,但似乎找不到。

你的问题是你有一个很长的->操作符链,缺少的元素是在该链的某个地方。只要您请求一个不存在的元素,您就会得到一个NULL,并且所有后续的->操作符将在某种程度上失败。

理论上,如果您不知道链中的哪个元素丢失了(也许您根据XML的已知/允许结构知道?),您必须将链分解为一系列中间赋值和isset()检查。

幸运的是,PHP允许您在调用null->Property时只调用Notice,因此只有->children()方法调用才会导致致命错误。所以你可以在每次调用之前检查:

 if( ! isset($xml_entry) { return; }
 $temp = $xml_entry->children('http://www.isotc211.org/2005/gmd')->identificationInfo->MD_DataIdentification->citation->CI_Citation->title;
 if( ! isset($temp) { return; }     
 echo $temp->children('http://www.isotc211.org/2005/gco'))->CharacterString;

然而,错误消息告诉你的比你可能意识到的更多:

  1. 注意:试图获得非对象的属性在/var/www/overzicht/script.php第38行
  2. 注意:试图获得非对象的属性在/var/www/overzicht/script.php第38行
  3. 致命错误:调用/var/www/overzicht/script.php中的非对象的成员函数children()

这是两个关于访问属性的Notice,一个关于访问方法的Fatal error。所以这一行必须像这样分开…

$xml_entry
    ->children('http://www.isotc211.org/2005/gmd')
    ->identificationInfo
    ->MD_DataIdentification
    // OK to here
    ->citation
    // This part didn't complain, but subsequent ones did; <citation> is the missing element
    ->CI_Citation
    // First Notice
    ->title
    // Second Notice
    ->children('http://www.isotc211.org/2005/gco'))
    // Fatal error - processing aborts here
    ->CharacterString

所以你需要检查的是<citation>:

是否存在
$citation = $xml_entry->children('http://www.isotc211.org/2005/gmd')->identificationInfo->MD_DataIdentification->citation;
if ( isset($citation) )
{
    echo $citation->CI_Citation->title->children('http://www.isotc211.org/2005/gco')->CharacterString;
}

您的解析代码可以很好地处理您的示例XML。您可以在codepad.viper-7.com/6oLCEZ和3v4l.org/pW7Wu上查看。

如果对children()的第一个调用是抱怨的,那么simplexml_load_file似乎失败了。如果失败,它将返回FALSE,因此需要对此进行检查。

if (FALSE === $xml_entry) {
    echo 'could not load file';
}

更多信息在这里的文档。URL可能是错误的,down或没有返回有效的XML。

否则,在实际的XML中似乎缺少元素导致错误。您可以像这样使用property_exists()检查缺少的元素…

$gmd = $xml_entry->children('http://www.isotc211.org/2005/gmd');
if (property_exists($gmd, 'identificationInfo')) {
    $id_info = $gmd->identificationInfo;
}
if (isset($id_info) && property_exists($id_info, 'MD_DataIdentification')) {
    $md_data_id = $id_info->MD_DataIdentification;
}
if (isset($md_data_id) && property_exists($md_data_id, 'citation')) {
    $citation = $md_data_id->citation;
}
if (isset($citation) && property_exists($citation, 'CI_Citation')) {
    $ci_citation = $citation->CI_Citation;
}
if (isset($ci_citation) && property_exists($ci_citation, 'title')) {
    $title = $ci_citation->title;
}
if (isset($title)) {
    $gco = $title->children('http://www.isotc211.org/2005/gco');
}
//Title
echo "<td>";
if (isset($gco) && property_exists($gco, 'CharacterString')) {
    echo $gco->CharacterString;
}
echo "</td>";

参见3v4l.org/0DTjI。更不用说处理多个同名元素的可能性了。因此,考虑到所有这些,现在使用XPath可能还不算太晚;-)

$title = $xml_entry->xpath('/gmd:MD_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:title/gco:CharacterString');
echo "<td>";
if (isset($title[0])) {
    $title[0];
}
echo "</td>";

下面这些行:

if($xml_entry->children('http://www.isotc211.org/2005/gmd')->identificationInfo->MD_DataIdentification->citation->CI_Citation->title->children('http://www.isotc211.org/2005/gco'))

是它们太长,太容易出错。即使SimpleXML也允许这种"简单"的访问,如果它没有在那里找到元素,它将返回NULL,然后你得到警告,甚至致命的错误。

对于您的用例,使用xpath查询来完成这项工作要好得多。由于您需要访问表示元数据的多个属性,我建议首先将其包装到它自己的类中,例如SimpleXMLElementXpathObject,可以在这里找到使用的PropertyIterator

此类型允许您使用SimpleXMLElement和通过将属性映射到xpath查询来描述属性的数组来定义查找的元数据:

$metaDef = array(
    'title'   => 'gmd:identificationInfo//gmd:CI_Citation/gmd:title/gco:CharacterString',
    'owner'   => 'gmd:contact/gmd:CI_ResponsibleParty/gmd:organisationName/gco:CharacterString',
    'purpose' => 'gmd:identificationInfo/gmd:MD_DataIdentification/gmd:purpose/gco:CharacterString',
);

可以看到,每个键都有一个xpath表达式。这些键将被转换为属性。这允许您动态地执行映射,例如:

$meta = new SimpleXMLElementXpathObject($xml, $metaDef);
echo $meta->title, "n";
echo json_encode($meta, JSON_PRETTY_PRINT), "n";
输出:

Natuur - Ecologische verbindingszones
{
    "title": "Natuur - Ecologische verbindingszones",
    "owner": "provincie Fryslu00e2n",
    "purpose": "Beleidsnota "ecologische verbindingszones in Fryslu00e2n" vastgesteld door Provinciale Staten op 4 oktober 2006. Opgenomen in het Streekplan 2007"
}

如果xpath没有返回结果,则给出NULL。这意味着属性是可选的,您不会看到任何警告甚至致命错误。为了弄清楚:这基本上是使用SimpleXMLElement中的xpath方法,因此您也可以自己运行这些查询。

一个更完整的例子:

$query = new GeoNetwork_Query();
$query
    ->setGeometry('POLYGON((5.5963 53.3162,5.5963 53.5766,6.9612 53.5766,6.9612 53.3162,5.5963 53.3162))')
    ->setLimit(10);
$metaObj = function (GeoNetwork_Resource $resource) {
    $metaDef = array(
        'title'   => 'gmd:identificationInfo//gmd:CI_Citation/gmd:title/gco:CharacterString',
        'owner'   => 'gmd:contact/gmd:CI_ResponsibleParty/gmd:organisationName/gco:CharacterString',
        'purpose' => 'gmd:identificationInfo/gmd:MD_DataIdentification/gmd:purpose/gco:CharacterString',
    );
    return new SimpleXMLElementXpathObject($resource->getIterator(), $metaDef);
};
$resources = new GeoNetwork_UuidIterator($query);
$objects   = new DecoratingIterator($resources, $metaObj);
$table     = new HtmlTableIterator($objects, ['Title', 'Owner', 'Purpose']);
echo "<table>n";
foreach ($table as $row) {
    echo $row, "n";
}
echo "</table>n";

我将输出限制为10,这样它就不会创建一个太长的列表(用于查询结果)。您还可以通过将$objects包装在LimitIterator中来限制它们。上面代码的示例输出:

<table>
<tr><td>Title</td><td>Owner</td><td>Purpose</td></tr>
<tr><td>Natuur - Ecologische verbindingszones</td><td>provincie Fryslân</td><td>Beleidsnota "ecologische verbindingszones in Fryslân" vastgesteld door Provinciale Staten op 4 oktober 2006. Opgenomen in het Streekplan 2007</td></tr>
<tr><td>CORINE: Veranderingen in landgebruik in Nederland tussen 1986 en 2000.</td><td>Alterra, Wageningen UR</td><td>Het monitoren van landgebruiksveranderingen op Europese schaal volgens een standaard methode.</td></tr>
<tr><td>Viswaterkaart Sportvisserij</td><td>Sportvisserij Nederland</td><td>Elke sportvisser moet exact weten waar die onder welke (bijz.) voorwaarden mag hengelen.</td></tr>
<tr><td>Veiligheidsafstand vuurwerk</td><td>Interprovinciaal Overleg</td><td>Risicokaart</td></tr>
<tr><td>Weggeg convergenties</td><td>Rijkswaterstaat Data en ICT Dienst (RWS DID)</td><td>Ruimtelijke analyses waarbij ligging van infrastructuur van belang is en bereikbaarheidsberekeningen</td></tr>
<tr><td>Beheerkaart Nat Versie januari 2008</td><td>Rijkswaterstaat Data en ICT Dienst (RWS DID)</td><td>De Beheerkaart Nat wordt door de natte districten van Rijkswaterstaat gebruikt ten behoeve van beheer en onderhoud van zijn beheerobjecten van de watersystemenen. Het NIS gebruikt de gegevens om ondermeer de benodigde budgetten te bepalen voor beheer en onderhoud.</td></tr>
<tr><td>Orthofotomozaieken_project</td><td>Rijkswaterstaat Data en ICT Dienst (RWS DID)</td><td>Gebruik als ondergrond</td></tr>
<tr><td>Knelpunten in LAW-routes</td><td>Stichting Wandelnet</td><td>Inventarisatie van knelpunten in LAW-routes voor provincies</td></tr>
<tr><td>Electronische zeekaarten Ned. Cont. Plat usage Harbour</td><td>Dienst der Hydrografie</td><td>Veilige navigatie</td></tr>
<tr><td>Maatregelzone kernenergie</td><td>Interprovinciaal Overleg</td><td>Risicokaart</td></tr>
</table>

在上面的代码中,我使用了下面的类:https://gist.github.com/hakre/94a36e4587214a6e9bc9

看起来应该按照这个链接使用XPath。

最新更新