我想在消息文本中匹配Telegram用户名并删除整行,我尝试过这种模式,但问题是它也匹配电子邮件:
.*(@(?=.{5,64}(?:s|$))(?![_])(?!.*[_]{2})[a-zA-Z0-9_]+(?<![_.])).*
图案应匹配所有这些行:
你好@username你好吗?
你好@username.你好吗?
😉@用户名。
并且不应该像这样匹配电子邮件:
嗨,发送电子邮件至something@domain.com
使用
.*B@(?=w{5,32}b)[a-zA-Z0-9]+(?:_[a-zA-Z0-9]+)*.*
查看验证
@
之前的B
意味着在@
之前必须有一个非单词字符或字符串的开头。
解释
NODE EXPLANATION
--------------------------------------------------------------------------------
.* any character except n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
B the boundary between two word chars (w)
or two non-word chars (W)
--------------------------------------------------------------------------------
@ '@'
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
w{5,32} word characters (a-z, A-Z, 0-9, _)
(between 5 and 32 times (matching the
most amount possible))
--------------------------------------------------------------------------------
b the boundary between a word char (w)
and something that is not a word char
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[a-zA-Z0-9]+ any character of: 'a' to 'z', 'A' to 'Z',
'0' to '9' (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
_ '_'
--------------------------------------------------------------------------------
[a-zA-Z0-9]+ any character of: 'a' to 'z', 'A' to
'Z', '0' to '9' (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
.* any character except n (0 or more times
(matching the most amount possible))
.*[W](@(?=.{5,64}(?:s|$))(?![_])(?!.*[_]{2})[a-zA-Z0-9_]+(?<![_.])).*
我在@symbol之前添加了这个[W]
非单词字符。在这里你可以查看结果https://regex101.com/r/yFGegO/1
阳光下没有什么新鲜事,但基本上其他模式可以简化为:
.*?B@w{5}.*
演示
或者最终:
.*?Bw{5,64}b.*
如果你想更精确,但它真的需要吗?
注意:如果您也想删除换行序列,请在模式末尾添加R?
。