Perl 正则表达式解释

我有这样的正则表达式：

 s/<(?:[^>'"]|(['"]).?1)*>//gs

我不知道这到底是什么意思。

正则表达式看起来旨在从输入中删除 HTML 标记。

它匹配以<开头并以>结尾的文本，包含非>/非引号或带引号的字符串（可能包含>）。但它似乎有一个错误：

.?说引号可能包含 0 或 1 个字符;它可能打算.*?（0 或更多字符）。为了防止回溯在某些奇怪的情况下使.匹配引号之类的事情，它需要将(?: ... )分组更改为所有格（>而不是:）。

此工具可以解释详细信息： http://rick.measham.id.au/paste/explain.pl?regex=%3C%28%3F%3A[^%3E%27%

22]|%28[%27%22]%29.%3F\1%29*%3E

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  <                        '<'
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    [^>'"]                   any character except: '>', ''', '"'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    (                        group and capture to 1:
--------------------------------------------------------------------------------
      ['"]                     any character of: ''', '"'
--------------------------------------------------------------------------------
    )                        end of 1
--------------------------------------------------------------------------------
    .?                       any character except n (optional
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    1                       what was matched by capture 1
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  >                        '>'

因此，它尝试删除 html 标记，正如 ysth 也提到的那样。

相关内容

最新更新

热门标签：