使用PHP根据内部模式向HTML标签添加属性

我实际上已经找到了一个可行的解决方案，它叫做regex。是的，我知道，它已经说过无数次不使用正则表达式的HTML解析。但问题是，正如标题所说，它取决于内部HTML文本，需要遵循一定的模式。所以我需要使用正则表达式无论如何!我尝试使用DOM库，但我失败了。

所以我的实际问题是，如果有这个问题的最佳实践?总之，这是我得到的:

HTML:

<section> 
    {foo:bar}
</section>

PHP:

// I'm not a regex ninja, but this seems to do the job
$regexTag = "/<(?!body|head|html|link|script|!|/)(w*)[^>]*>[^{]*{s*[^>]*:s*[^>]*s*[^}]}/";
// $match[0] "<section> {foo:bar}"
// $match[1] "section"

preg_match_all($regexTag,$html, $match); 

for ($i=0; $i < sizeof($match[0]); $i++) { 
    $pos = (strlen($match[1][$i])+1);
    $str = substr_replace($match[0][$i], " class='foo'", $pos, 0);
    $html = str_replace($match[0][$i], $str, $html);
}

HTML后:

<section class='foo'> 
    {foo:bar}
</section>

正则表达式不是此作业的正确工具。坚持使用DOM解析器方法。下面是使用DOMDocument类的快速解决方案。

使用getElementsByTagName('*')获取所有标签，然后使用in_array()检查标签名称是否在禁用标签列表中。

然后使用preg_match()的正则表达式来检查文本内容是否遵循{foo:bar}模式。如果是，则逐个添加新属性，setAttribute() method:

// An array containing all attributes
$attrs = [
    'class' => 'foo'
    /* more attributes & values */
];
$ignored_tags = ['body', 'head', 'html', 'link', 'script'];
$dom = new DOMDocument;
$dom->loadXML($html);
foreach ($dom->getElementsByTagName('*') as $tag) 
{
    // If not a disallowed tag
    if (!in_array($tag->tagName, $ignored_tags)) 
    {
        $textContent = trim($tag->textContent);
        // If $textContent matches the format '{foo:bar}'
        if (preg_match('#{s*[^>]*:s*[^>]*s*[^}]}#', $textContent)) 
        {
            foreach ($attrs as $attr => $val)
                $tag->setAttribute($attr, $val);
        }
    }
}
echo $dom->saveHTML();

输出:

<section class="foo"> 
    {foo:bar}
</section>

这行得通

$elements = $dom->getElementsByTagName('body')->item(0)->childNodes;
for ($i = $elements->length-1; $i >= 0; $i--) { 
   $element = $elements->item($i); 
   $tag =  $element->nodeName;
   foreach ($dom->getElementsByTagName($tag) as $tag) {
       ...

我不知道，我仍然觉得用正则表达式更舒服，哈哈。

相关内容

最新更新

热门标签：