使用正则表达式从文本中删除URL，当前域除外

我正在尝试预替换一个字符串，并从中删除所有不包含当前域的URL。

到目前为止，我得到了这个正则表达式，但它并不排除mydomain。我做错了什么
http[s]?://[w]{0,3}.{0,1}((?<!mydomain)[^.].*)

预期输入和输出：
http://regex101.=>应匹配
http://www.regex101.=>应匹配
https://regex101.=>应匹配
https://www.regex101.=>应匹配
https://www.mydomain.=>不应该匹配，但它匹配

https://regex101.com/r/kGil9O/1

我读过几个SO问题/答案，要么不适用于我的情况，要么在某种程度上有所不同。当回答时，请解释一下我错在哪里，这样我下次会更好。谢谢

如果mydomain在匹配后不直接位于左侧，则负查找会断言，例如https://或https://www，这始终为真，因此您将获得与尝试的模式的所有匹配。

您可以选择使用所有格量词后跟否定先行词来匹配www.：

^https?://(?:www.)?+(?!mydomain.)S+$

模式匹配：

^字符串开始
https?://将协议与可选的s和://匹配
(?:www.)?+可选择匹配www.，并在匹配时使用所有格量词不回溯
(?!mydomain.)否定前瞻，不直接在当前位置右侧断言mydomain.
S+匹配任何非whitspace字符的1+倍
$字符串结束

regex演示| Php演示

示例

$strings = [
"http://regex101.",
"http://www.regex101.",
"https://regex101.",
"https://www.regex101.",
"https://www.mydomain.",
"https://mydomain."
];
$pattern = "~^https?://(?:www.)?+(?!mydomain.)S+$~";
foreach ($strings as $s) {
if (preg_match($pattern, $s)) {
echo "Match: $s" . PHP_EOL;
} else {
echo "No match: $s" . PHP_EOL;
}
}

输出

Match: http://regex101.
Match: http://www.regex101.
Match: https://regex101.
Match: https://www.regex101.
No match: https://www.mydomain.
No match: https://mydomain.

如果使用了错误的lookbacking，它会检查左侧的文本，并且在lookbacking之前尝试匹配www.。CCD_ 22不是CCD_。

使用前瞻性：

https?://(?!(?:www.)?mydomain)(?:www.)?([^.].*)

查看验证

解释

--------------------------------------------------------------------------------
http                     'http'
--------------------------------------------------------------------------------
s?                       's' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
:                        ':'
--------------------------------------------------------------------------------
/                       '/'
--------------------------------------------------------------------------------
/                       '/'
--------------------------------------------------------------------------------
(?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
(?:                      group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
www                      'www'
--------------------------------------------------------------------------------
.                       '.'
--------------------------------------------------------------------------------
)?                       end of grouping
--------------------------------------------------------------------------------
mydomain                 'mydomain'
--------------------------------------------------------------------------------
)                        end of look-ahead
--------------------------------------------------------------------------------
(?:                      group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
www                      'www'
--------------------------------------------------------------------------------
.                       '.'
--------------------------------------------------------------------------------
)?                       end of grouping
--------------------------------------------------------------------------------
(                        group and capture to 1:
--------------------------------------------------------------------------------
[^.]                     any character except: '.'
--------------------------------------------------------------------------------
.*                       any character except n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)                        end of 1

相关内容

最新更新

热门标签：