测试重新编译编译"on the fly"给出空匹配，而不是 None

我正在尝试测试一堆regex案例(特别是如果某个东西是一个测量单位：5px、300mm等(。

我能找到的关于re.match的所有问题都告诉我，我测试如下：

import re
test_phrase_good = "5kw"
test_phrase_bad = "5XXX"
registered_units_regex = re.compile('[0-9]+kw|[0-9]+mm', re.IGNORECASE)
# test in the following way:
registered_units_regex.match(token) is not None
# so:
print(registered_units_regex.match(test_phrase_good)) # <re.Match object; span=(0, 3), match='5kw'>
print(registered_units_regex.match(test_phrase_bad)) #  None

正如预期的那样。

然而，如果我想在循环中编译regex，我会得到以下内容：

regex_units = [r"mm", r"kw"]
regex_string = r""
for unit in regex_units:
regex_string += r'[0-9]+{}|'.format(unit)
regex_string = regex_string.rstrip('|')
registered_units_regex = re.compile(regex_string, re.IGNORECASE)
registered_units_regex.match(test_phrase_good) is not None
# so:
print(registered_units_regex.match(test_phrase_good)) # <re.Match object; span=(0, 3), match='5kw'>
print(registered_units_regex.match(test_phrase_bad)) #  <re.Match object; span=(0, 0), match=''>

这意味着我不能再使用CCD_ 1作为测试。

为什么会这样？我应该如何正确地测试没有对手？

正则表达式末尾的|表示它匹配一个空字符串。您应该只将|放在备选方案之间，而不是放在它们之前或之后。您可以使用join来执行此操作。

regex_units和格式字符串中都有[0-9]+，不需要在这两个位置都有。

regex_units = ["mm","kw"]
regex_string = "|".join(r'[0-9]+{}|'.format(unit) for unit in regex_units)

相关内容

最新更新

热门标签：