在 Python 中使用正则表达式选择所有排列而不重复

我有三类字符，比如字母[A-Za-z]，数字[0-9]，符号[!@#$]。为了论证，特定的符号并不重要。我想在 Python 中使用正则表达式，以便我可以选择这三个类上长度为 3 的所有排列，而不会重复。

例如，以下内容将成功匹配：

a1!
4B_
*x7

以下操作将失败：

ab!
BBB
*x_
a1!B

如果不在我的正则表达式中明确写出类的每个潜在排列，我将如何做到这一点？

我以前尝试过以下解决方案：

import re
regex = r"""
([A-Za-z]|[0-9]|[!@#$])
(?!1)    ([A-Za-z]|[0-9]|[!@#$])
(?![12])([A-Za-z]|[0-9]|[!@#$])
"""
s = "ab1"
re.fullmatch(regex, s, re.VERBOSE)

但是，字符串ab1匹配不正确。这是因为组引用1，2引用组的实际匹配内容，而不是组中包含的正则表达式。

那么，我如何引用先前匹配组中包含的正则表达式，而不是它们的实际内容？

您的主要问题是您不能使用反向引用来否定模式的一部分，您只能使用它们来匹配/否定与相应捕获组中捕获的相同值。

注意[^1]匹配除x01字符以外的任何字符，而不是组 1 包含的任何字符，因为在字符类中，反向引用不再如此。ab1匹配贝考伊b不等于a，1不等于a和1。

您可以使用的是一系列负面的前瞻，在某些情况下会"排除"匹配，例如字符串不能有两个数字/字母/特殊字符。

rx = re.compile(r"""
(?!(?:[Wd_]*[^Wd_]){2})      # no two letters allowed
(?!(?:D*d){2})                 # no two digits allowed
(?!(?:[^_!@#$*]*[_!@#$*]){2})  # no two special chars allowed
[w!@#$*]{3}                    # three allowed chars
""", re.ASCII | re.VERBOSE)

请参阅正则表达式演示。在演示中，否定的字符类将替换为.*，因为测试是针对单个多行文本而不是单独的字符串执行的。

请参阅Python演示：

import re
passes = ['a1!','4B_','*x7']
fails = ['ab!','BBB','*x_','a1!B']
rx = re.compile(r"""
(?!(?:[Wd_]*[^Wd_]){2})      # no two letters allowed
(?!(?:D*d){2})                 # no two digits allowed
(?!(?:[^_!@#$*]*[_!@#$*]){2})  # no two special chars allowed
[w!@#$*]{3}                    # three allowed chars
""", re.ASCII | re.VERBOSE)
for s in passes:
print(s, ' should pass, result:', bool(rx.fullmatch(s)))
for s in fails:
print(s, ' should fail, reuslt:', bool(rx.fullmatch(s)))

输出：

a1!  should pass, result: True
4B_  should pass, result: True
*x7  should pass, result: True
ab!  should fail, reuslt: False
BBB  should fail, reuslt: False
*x_  should fail, reuslt: False
a1!B  should fail, reuslt: False

一个简单的解决方案是不要自己写出排列，而是让 Python 在迭代工具的帮助下完成。

from itertools import permutations
patterns = [
'[a-zA-Z]',
'[0-9]',
'[!@#$]'
]
regex = '|'.join(
''.join(p)
for p in permutations(patterns)
)

相关内容

最新更新

热门标签：