我需要以几种不同的方式处理html字符串中的链接。
$str = 'My long <a href="http://example.com/abc" rel="link">string</a> has any
<a href="/local/path" title="with attributes">number</a> of
<a href="#anchor" data-attr="lots">links</a>.'
$links = extractLinks($str);
foreach ($links as $link) {
$pattern = "#((http|https|ftp)://(S*?.S*?))(s|;|)|]|[|{|}|,|"|'|:|<|$|.s)#ie";
if (preg_match($pattern,$str)) {
// Process Remote links
// For example, replace url with short url,
// or replace long anchor text with truncated
} else {
// Process Local Links, Anchors
}
}
function extractLinks($str) {
// First, I tried DomDocument
$dom = new DomDocument();
$dom->loadHTML($str);
return $dom->getElementsByTagName('a');
// But this just returns:
// DOMNodeList Object
// (
// [length] => 3
// )
// Then I tried Regex
if(preg_match_all("|<a.*(?=href="([^"]*)")[^>]*>([^<]*)</a>|i", $str, $matches)) {
print_r($matches);
}
// But this didn't work either.
}
extractLinks($str)
的期望结果:
[0] => Array(
'str' = '<a href="http://example.com/abc" rel="link">string</a>',
'href' = 'http://example.com/abc';
'anchorText' = 'string'
),
[1] => Array(
'str' = '<a href="/local/path" title="with attributes">number</a>',
'href' = '/local/path';
'anchorText' = 'number'
),
[2] => Array(
'str' = '<a href="#anchor" data-attr="lots">links</a>',
'href' = '#anchor';
'anchorText' = 'links'
);
我需要所有这些,所以我可以做一些事情,比如编辑href(添加跟踪,缩短等),或者用其他东西替换整个标签(<a href="/u/username">username</a>
可以变成username
)。
下面是我要做的一个演示
您只需将其更改为:
$str = 'My long <a href="http://example.com/abc" rel="link">string</a> has any
<a href="/local/path" title="with attributes">number</a> of
<a href="#anchor" data-attr="lots">links</a>.';
$dom = new DomDocument();
$dom->loadHTML($str);
$output = array();
foreach ($dom->getElementsByTagName('a') as $item) {
$output[] = array (
'str' => $dom->saveHTML($item),
'href' => $item->getAttribute('href'),
'anchorText' => $item->nodeValue
);
}
通过将其放入循环并使用getAttribute
, nodeValue
和saveHTML(THE_NODE)
,您将得到您的输出
像这样
<as*href="([^"]+)"[^>]+>([^<]+)</a>
- 整体匹配是你想要的0数组元素
- 组#1捕获是你想要的1个数组元素
- 组#2捕获是你想要的2个数组元素
使用preg_match($pattern,$string,$m)
数组元素将在$m[0]
$m[1]
$m[3]
工作的PHP演示在这里
$string = 'My long <a href="http://example.com/abc" rel="link">string</a> has any
<a href="/local/path" title="with attributes">number</a> of
<a href="#anchor" data-attr="lots">links</a>. ';
$regex='|<as*href="([^"]+)"[^>]+>([^<]+)</a>|';
$howmany = preg_match_all($regex,$string,$res,PREG_SET_ORDER);
print_r($res);