我正在尝试预替换一个字符串,并从中删除所有不包含当前域的URL。
到目前为止,我得到了这个正则表达式,但它并不排除mydomain。我做错了什么http[s]?://[w]{0,3}.{0,1}((?<!mydomain)[^.].*)
预期输入和输出:http://regex101.
=>应匹配http://www.regex101.
=>应匹配https://regex101.
=>应匹配https://www.regex101.
=>应匹配https://www.mydomain.
=>不应该匹配,但它匹配
https://regex101.com/r/kGil9O/1
我读过几个SO问题/答案,要么不适用于我的情况,要么在某种程度上有所不同。当回答时,请解释一下我错在哪里,这样我下次会更好。谢谢
如果mydomain
在匹配后不直接位于左侧,则负查找会断言,例如https://
或https://www
,这始终为真,因此您将获得与尝试的模式的所有匹配。
您可以选择使用所有格量词后跟否定先行词来匹配www.
:
^https?://(?:www.)?+(?!mydomain.)S+$
模式匹配:
^
字符串开始https?://
将协议与可选的s
和://
匹配(?:www.)?+
可选择匹配www.
,并在匹配时使用所有格量词不回溯(?!mydomain.)
否定前瞻,不直接在当前位置右侧断言mydomain.
S+
匹配任何非whitspace字符的1+倍$
字符串结束
regex演示| Php演示
示例
$strings = [
"http://regex101.",
"http://www.regex101.",
"https://regex101.",
"https://www.regex101.",
"https://www.mydomain.",
"https://mydomain."
];
$pattern = "~^https?://(?:www.)?+(?!mydomain.)S+$~";
foreach ($strings as $s) {
if (preg_match($pattern, $s)) {
echo "Match: $s" . PHP_EOL;
} else {
echo "No match: $s" . PHP_EOL;
}
}
输出
Match: http://regex101.
Match: http://www.regex101.
Match: https://regex101.
Match: https://www.regex101.
No match: https://www.mydomain.
No match: https://mydomain.
如果使用了错误的lookbacking,它会检查左侧的文本,并且在lookbacking之前尝试匹配www.
。CCD_ 22不是CCD_。
使用前瞻性:
https?://(?!(?:www.)?mydomain)(?:www.)?([^.].*)
查看验证
解释
--------------------------------------------------------------------------------
http 'http'
--------------------------------------------------------------------------------
s? 's' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
www 'www'
--------------------------------------------------------------------------------
. '.'
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
mydomain 'mydomain'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
www 'www'
--------------------------------------------------------------------------------
. '.'
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
( group and capture to 1:
--------------------------------------------------------------------------------
[^.] any character except: '.'
--------------------------------------------------------------------------------
.* any character except n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of 1