这是一个PCRE正则表达式。https://regex101.com/r/gJ7pU0/1可以验证电子邮件地址。
ruby是否有符合RFC5322的正则表达式?Ruby有URI::MailTo::EMAIL_REGEXP
,但我认为它不符合RFC5322。
另一篇帖子提到了这个"邮件"宝石,但我看不到用它验证电子邮件地址的方法
https://github.com/mikel/mail/tree/6b0ebb142c476bf7c00524effe513a4f151f59ab
PERC RFC5322兼容
(?(DEFINE)
(?<addr_spec> (?&local_part) @ (?&domain) )
(?<local_part> (?&dot_atom) | (?"ed_string) | (?&obs_local_part) )
(?<domain> (?&dot_atom) | (?&domain_literal) | (?&obs_domain) )
(?<domain_literal> (?&CFWS)? [ (?: (?&FWS)? (?&dtext) )* (?&FWS)? ] (?&CFWS)? )
(?<dtext> [x21-x5a] | [x5e-x7e] | (?&obs_dtext) )
(?<quoted_pair> \ (?: (?&VCHAR) | (?&WSP) ) | (?&obs_qp) )
(?<dot_atom> (?&CFWS)? (?&dot_atom_text) (?&CFWS)? )
(?<dot_atom_text> (?&atext) (?: . (?&atext) )* )
(?<atext> [a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+ )
(?<atom> (?&CFWS)? (?&atext) (?&CFWS)? )
(?<word> (?&atom) | (?"ed_string) )
(?<quoted_string> (?&CFWS)? " (?: (?&FWS)? (?&qcontent) )* (?&FWS)? " (?&CFWS)? )
(?<qcontent> (?&qtext) | (?"ed_pair) )
(?<qtext> x21 | [x23-x5b] | [x5d-x7e] | (?&obs_qtext) )
# comments and whitespace
(?<FWS> (?: (?&WSP)* rn )? (?&WSP)+ | (?&obs_FWS) )
(?<CFWS> (?: (?&FWS)? (?&comment) )+ (?&FWS)? | (?&FWS) )
(?<comment> ( (?: (?&FWS)? (?&ccontent) )* (?&FWS)? ) )
(?<ccontent> (?&ctext) | (?"ed_pair) | (?&comment) )
(?<ctext> [x21-x27] | [x2a-x5b] | [x5d-x7e] | (?&obs_ctext) )
# obsolete tokens
(?<obs_domain> (?&atom) (?: . (?&atom) )* )
(?<obs_local_part> (?&word) (?: . (?&word) )* )
(?<obs_dtext> (?&obs_NO_WS_CTL) | (?"ed_pair) )
(?<obs_qp> \ (?: x00 | (?&obs_NO_WS_CTL) | n | r ) )
(?<obs_FWS> (?&WSP)+ (?: rn (?&WSP)+ )* )
(?<obs_ctext> (?&obs_NO_WS_CTL) )
(?<obs_qtext> (?&obs_NO_WS_CTL) )
(?<obs_NO_WS_CTL> [x01-x08] | x0b | x0c | [x0e-x1f] | x7f )
# character class definitions
(?<VCHAR> [x21-x7E] )
(?<WSP> [ t] )
)
^(?&addr_spec)$
PCRE到Onigmo递归/子程序正则表达式的转换是直接的:
- 删除不受支持的
(?(DEFINE)...)
构造 - 将用于定义消费模式的所有命名组放在正则表达式的开头,并对所有组应用
{0}
量词,使它们不匹配 - 将
(?&...)
替换为g<...>
语法(我刚刚在Notepad++中用(?&(w+))
替换为\g<$1>
(
Ruby中的最终表达式看起来像
re =/(?<addr_spec> g<local_part> @ g<domain> ){0}
(?<local_part> g<dot_atom> | g<quoted_string> | g<obs_local_part> ){0}
(?<domain> g<dot_atom> | g<domain_literal> | g<obs_domain> ){0}
(?<domain_literal> g<CFWS>? [ (?: g<FWS>? g<dtext> )* g<FWS>? ] g<CFWS>? ){0}
(?<dtext> [x21-x5a] | [x5e-x7e] | g<obs_dtext> ){0}
(?<quoted_pair> \ (?: g<VCHAR> | g<WSP> ) | g<obs_qp> ){0}
(?<dot_atom> g<CFWS>? g<dot_atom_text> g<CFWS>? ){0}
(?<dot_atom_text> g<atext> (?: . g<atext> )* ){0}
(?<atext> [a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+ ){0}
(?<atom> g<CFWS>? g<atext> g<CFWS>? ){0}
(?<word> g<atom> | g<quoted_string> ){0}
(?<quoted_string> g<CFWS>? " (?: g<FWS>? g<qcontent> )* g<FWS>? " g<CFWS>? ){0}
(?<qcontent> g<qtext> | g<quoted_pair> ){0}
(?<qtext> x21 | [x23-x5b] | [x5d-x7e] | g<obs_qtext> ){0}
# comments and whitespace
(?<FWS> (?: g<WSP>* rn )? g<WSP>+ | g<obs_FWS> ){0}
(?<CFWS> (?: g<FWS>? g<comment> )+ g<FWS>? | g<FWS> ){0}
(?<comment> ( (?: g<FWS>? g<ccontent> )* g<FWS>? ) ){0}
(?<ccontent> g<ctext> | g<quoted_pair> | g<comment> ){0}
(?<ctext> [x21-x27] | [x2a-x5b] | [x5d-x7e] | g<obs_ctext> ){0}
# obsolete tokens
(?<obs_domain> g<atom> (?: . g<atom> )* ){0}
(?<obs_local_part> g<word> (?: . g<word> )* ){0}
(?<obs_dtext> g<obs_NO_WS_CTL> | g<quoted_pair> ){0}
(?<obs_qp> \ (?: x00 | g<obs_NO_WS_CTL> | n | r ) ){0}
(?<obs_FWS> g<WSP>+ (?: rn g<WSP>+ )* ){0}
(?<obs_ctext> g<obs_NO_WS_CTL> ){0}
(?<obs_qtext> g<obs_NO_WS_CTL> ){0}
(?<obs_NO_WS_CTL> [x01-x08] | x0b | x0c | [x0e-x1f] | x7f ){0}
# character class definitions
(?<VCHAR> [x21-x7E] ){0}
(?<WSP> [ t] ){0}
^g<addr_spec>$/x
查看Ruby测试:
p re.match?('+1~1+@iana.org') # => true
p re.match?('test@[123.123.123.123') # => false