仅使用大写连字符匹配单词



我正在尝试匹配包含1个以上字母的单词:全大写、第一个字母小写和后面的字母大写,或者仅当所有字母都是大写时才在中间包含连字符。这是我的代码:

s = "ASCII, aSCII, AS-CII, AS-cii"
myset =   set(re.findall(r"b[a-z]?[A-Z]+-?[A-Z]{1,}",s))
Out[555]: {'AS', 'AS-CII', 'ASCII', 'aSCII'}

正如您所看到的,不应该返回"AS",因为它在连字符后面包含小写字母。我该怎么解决这个问题?

尝试了这个,但结果是一个错误:

myset = set(re.findall(r"b[a-z]?[A-Z]+-?[A-Z]+{1,}",s))
File "<ipython-input-545-7bdc0c902553>"
myset = set(re.findall(r"b[a-z]?[A-Z]+-?[A-Z]+{1,}",s))
File "/home/c1962135/.local/share/virtualenvs/c1962135-9R_1M4TP/lib/python3.6/re.py", line 222, in findall
return _compile(pattern, flags).findall(string)
File "/home/c1962135/.local/share/virtualenvs/c1962135-9R_1M4TP/lib/python3.6/re.py", line 301, in _compile
p = sre_compile.compile(pattern, flags)
File "/home/c1962135/.local/share/virtualenvs/c1962135-9R_1M4TP/lib/python3.6/sre_compile.py", line 562, in compile
p = sre_parse.parse(p, flags)
File "/home/c1962135/.local/share/virtualenvs/c1962135-9R_1M4TP/lib/python3.6/sre_parse.py", line 855, in parse
p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
File "/home/c1962135/.local/share/virtualenvs/c1962135-9R_1M4TP/lib/python3.6/sre_parse.py", line 416, in _parse_sub
not nested and not items))
File "/home/c1962135/.local/share/virtualenvs/c1962135-9R_1M4TP/lib/python3.6/sre_parse.py", line 619, in _parse
source.tell() - here + len(this))
error: multiple repeat

您可以使用条件表达式:

(...)?(if true than this|else this)

对于您的情况,这可能是

b([a-z])?(?(1)[A-Z]+|[-A-Z]+[A-Z])(?!-)b

请参阅regex101.com上的演示。


细分后读取

b               # a word boundary
([a-z])?         # match a lower case letter if it is there
(?(1)            # if the lower case letter is there, match this branch
[A-Z]+
|
[-A-Z]+[A-Z] # else this one
)
(?!-)b          # do not break at a -, followed by another boundary

这里是

res = [x[0] for x in re.findall(r"(([a-z]{1}[A-Z]+)|([A-Z]+-[A-Z]+))",s)]
print(res)
print(set(res))

给出

['aSCII', 'AS-CII']

告诉我。我拆分为添加OR逻辑,中间有|。

以下正则表达式匹配所有提到的标准:

b[a-z]*[A-Z]+[-A-Z]+[A-Z]+b

请在此处查看https://regex101.com/r/JNC4kN/1/

但是,如果你给出这种类型的例子,比如aTHTHTH(连字符和大写字母后面的小写字母(,这将失败。如果你只想要UPPER-UPPER,那么按照这个正则表达式:

b[a-z]{0,1}(?<!-)[A-Z]+b(?!-)|b[A-Z]+-[A-Z]+b

检查此处

您可以使用以下正则表达式,它涵盖了与连字符前面或后面的单词有关的边缘大小写(如下面的链接所示(:

(?<!w|(?<=w)-)(?:[a-zA-Z][A-Z]+|[A-Z]{2,}|[A-Z]+-[A-Z]+)(?!w|-(?=w))

演示

Python的正则表达式引擎执行以下操作。

(?<!              # begin a negative lookbehind
w              # match word char
|               # or
(?<=w)         # match a word char in a positive lookbehind
-               # match '-'
)                 # end negative lookbehind
(?:               # begin non-cap grp
[a-zA-Z][A-Z]+  # match a lc letter then 1+ uc letters
|               # or
[A-Z]{2,}       # match 2+ uc letters
|               # or
[A-Z]+-[A-Z]+   # match 1+ uc letters, '-', then 1+ uc letters
)                 # end non-cap grp
(?!               # begin negative lookahead
w              # match word char
|               # or
-               # match '-'
(?=w)          # match a word char in a positive lookahead
)                 # end negative lookahead

相关内容

最新更新