如何更新正则表达式以不考虑 or ( | ) 语句中的顺序或如何模拟: - How to update a regex to not consider order in an or (

这个正则表达式

(<links+)((rel="[Ii]con"s+)|(rel="[Ss]hortcut [Ii]con"s+))(href="(.+)")(.+)/>

工作于

<link rel="icon" href="http://passets-cdn.pinterest.com/images/favicon.png" type="image/x-icon" />
<link rel="shortcut icon" href="http://css.nyt.com/images/icons/nyt.ico" />
<link rel="shortcut icon" href="http://cdn.sstatic.net/careers/Img/favicon.ico?36da6b" />
<link rel="Shortcut Icon" href="/favicon.ico" type="image/x-icon" />

但不适用于切换 href 和 rel 属性的位置：

  <link href="/phoenix/favicon.ico" rel="shortcut icon" type="image/x-icon" />

如何更新它以使 or 语句不排序

因此

aa || bb

工作一样好

bb || aa

在这里测试：

http://regexpal.com/

我只想从图标标签中提取路径...我选择不使用库。

Stema以不同的形式回答：

<links+
    (
        ?=[^>]*rel="
        (
            ?:[Ss]hortcuts
        )
        ?[Ii]con"s+
    )
    (
        ?:[^>]*href="
        (
            .+?
        )"
    ).*
/>

你不能，不能使用单个正则表达式。好吧，你实际上可以，但这真的不值得，你最终会得到一个不可读的正则表达式的混乱。

与/<links([^>]+rel="(shortcuts+)?icon"[^>]*)>/i匹配，然后将捕获的部分与/shref="([^"]+)"/i匹配。

你可以用积极的展望来做到这一点

<links+(?=[^>]*rel="(?:[Ss]hortcuts)?[Ii]con"s+)(?:[^>]*href="(.+?)").*/>

在正则表达式上看到它

您将在第一个捕获组中找到路径。

这里的问题是，前瞻与任何东西都不匹配。因此，您可以检查标签中的某处是否有rel="(?:[Ss]hortcuts)?[Ii]con"，如果找到此模式，它将匹配href部分并将链接放入捕获组 1。

(?=[^>]*rel="(?:[Ss]hortcuts)?[Ii]con"s+)这是积极的前瞻性断言。这由组开头的?=所指示。

[^>] 是一个否定的字符类，它匹配除>以外的任何字符。我使用它来确保它不会通过标签的结束>。

您可以使用一个正则表达式来定位图标标签，并使用第二个正则表达式来拉取路径。

如果您的第二个正则表达式解析的唯一文本是单个标签，它可以像/href="(.+)"/一样简单，标签中属性的顺序无关紧要。

我建议使用PHP的SimpleXML。

$html = '<link href="/phoenix/favicon.ico" rel="shortcut icon" type="image/x-icon" />';
$xml = new SimpleXMLElement($html);
echo $xml->attributes()->href;

如何更新正则表达式以不考虑 or ( | ) 语句中的顺序或如何模拟:

相关内容

最新更新

热门标签：