PHP数组需要删除几乎相同的值,但是Trim将脚本冻结



所以我有一个脚本可以从网页中拉出链接并将其保存到数组,然后删除重复链接。

$xpath = new DOMXpath($this->doc); //Create instance of DOMXpath() class. --php core--
$elements = $xpath->query("//a[not(@rel='nofollow')]/@href"); //Use $xpath to pull links from page. nofollow links are ignored.
$this_page= array(); //Insure $this_page is set, even if 0 links are found causing a false null on is_null
if (!is_null($elements)) foreach ($elements as $element) $this_page[]= $element->nodeValue; //Create a array of located links from DOMXpath object.
$this_page= array_unique($this_page); //Remove duplacate links.
$url_path= parse_url($path, PHP_URL_PATH); //Get the path of link to locate file infomation.
if(is_file(glob($_SERVER['DOCUMENT_ROOT'] . "/" . ltrim($url_path, '/') . "{,.php,.html,.htm}", GLOB_BRACE)[0]))  //Determan the file extension and remove root path.
{
$directories= explode('/', rtrim($path, '/'));
array_pop($directories);
$path= implode('/', $directories);
}
else
{
array_push($this->dead_links, array("page"=>rtrim($path, '/'), "link"=>'test'));
}
clearstatcache(); //Clear catched values.

它几乎可以完美地工作,但是遇到了具有类似链接的问题,特别是当存在一个根相对链接和页面相对链接

示例:

<a href="www.mysite.com/page_one"></a>
<a href="www.mysite.com/page_two"></a>
<a href="page_one"></a>
<a href="page_two"></a>
<a href="/page_two"></a>

整个脚本运行后产生的数组变成:

$this_page[0]= 'page_one';
$this_page[1]= 'page_two';
$this_page[2]= '/page_two';

当然$this_page[2]= '/page_two'$this_page[1]= 'page_two'相同,但array_unique()不知道'/'的beclause。

我尝试在 $this_page中添加 trim(),但这会使脚本冻结或放慢速度太大,我无法分辨哪个。有其他解决方案吗?

当您构建锚网列表时,请尝试...

foreach ($elements as $element) {
    $this_page[]= ltrim($element->nodeValue,"/");
}

这会在构建列表时修剪名称。

作为替代方案,您可以确保它们都以a/'开头,并使它们以这种方式保持一致。

相关内容

最新更新