如何使用Symfony dom crawler过滤或提取链接元素不包括跨度并将其保存在逗号分离的数组中



<span class="tl">
<a href="/en/laravel/" class="c">laravel</a>, <span>goutte</span>, <a href="/en/html/">html</a>, <span>dom crawler</span>, <a href="/en/form/">form</a><span>guzzle</span>, <span>web scrapper</span>
</span>
<span class="tl">
<a href="/en/laravel/" class="c">laravel</a>, <span>goutte</span>, <a href="/en/elequent/">elequent</a>, <span>dom crawler</span>, <span>guzzle</span>, <a href="/en/orm/">orm</a>, <span>web scrapper</span>
</span>
<span class="tl">
<a href="/en/laravel/" class="c">laravel</a>, <a href="/en/goutte">goutte</a>, <a href="/en/php/">php</a>, <span>dom crawler</span>, <a href="/en/guzzle">guzzle</a>, <a href="/en/web-scrapper">web scrapper</a>
</span>

我想在这样的数组中提取信息

array (size=3)
  0 => string 'laravel, html, form' (length=19)
  1 => string 'laravel, elequent, orm' (length=22)
  2 => string 'laravel, goutte, php, guzzle, web scrapper' (length=43)

尝试此代码段

<?php
ini_set('display_errors', 1);
$string=<<<HTML
<span class="tl">
<a href="/en/laravel/" class="c">laravel</a>, <span>goutte</span>, <a href="/en/html/">html</a>, <span>dom crawler</span>, <a href="/en/form/">form</a><span>guzzle</span>, <span>web scrapper</span>
</span>
<span class="tl">
<a href="/en/laravel/" class="c">laravel</a>, <span>goutte</span>, <a href="/en/elequent/">elequent</a>, <span>dom crawler</span>, <span>guzzle</span>, <a href="/en/orm/">orm</a>, <span>web scrapper</span>
</span>
<span class="tl">
<a href="/en/laravel/" class="c">laravel</a>, <a href="/en/goutte">goutte</a>, <a href="/en/php/">php</a>, <span>dom crawler</span>, <a href="/en/guzzle">guzzle</a>, <span>web scrapper</span>
</span>
HTML;
$domDocument = new DOMDocument();
$domDocument->loadHTML($string);
$domXPath = new DOMXPath($domDocument);
$results = $domXPath->query('//span[@class="tl"]');
$data=array();
foreach($results as $result)
{
    $tempArray=array();
    $aNodes=$domXPath->query(".//a",$result);
    foreach($aNodes as $aNode)
    {
        if($aNode instanceof DOMElement)
        {
            $tempArray[]=$aNode->nodeValue;
        }
    }
    $data[]=  implode(", ", $tempArray);
}
print_r($data);

最新更新