使用 Python 3 的正则表达式中意外的未终止子模式



我在python 3中有一个详细的正则表达式来捕获Windows文件路径。捕获可选的驱动器卷,然后是字符、反斜杠、一个或多个字符,然后是可选的文件扩展名。

(
(
([A-Za-z]:)
(\){1,2}
)?                      # group to catch optional drive volume
(
([A-Za-z0-9_%~-])* # catch some letters/symbols
(\)                # catch one backslash
([A-Za-z0-9_%~-])* # catch more letters/symbols
)+                      # at least one of this group
(
.[a-zA-Z]{3,4}
)?                      # catch optional file extension
)

据我所知,所有括号都已终止,但我仍然在第 3 行第 17 列收到未终止的括号错误。

File "C:UsersmreaDocumentsResult Fingerprintinglineidentifier.py", line 282, in identify_line
for match_obj in re.finditer(reg, line, re.VERBOSE):
File "C:UsersmreaAppDataLocalProgramsPythonPython37-32libre.py", line 230, in finditer
return _compile(pattern, flags).finditer(string)
File "C:UsersmreaAppDataLocalProgramsPythonPython37-32libre.py", line 286, in _compile
p = sre_compile.compile(pattern, flags)
File "C:UsersmreaAppDataLocalProgramsPythonPython37-32libsre_compile.py", line 764, in compile
p = sre_parse.parse(p, flags)
File "C:UsersmreaAppDataLocalProgramsPythonPython37-32libsre_parse.py", line 930, in parse
p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
File "C:UsersmreaAppDataLocalProgramsPythonPython37-32libsre_parse.py", line 426, in _parse_sub
not nested and not items))
File "C:UsersmreaAppDataLocalProgramsPythonPython37-32libsre_parse.py", line 816, in _parse
p = _parse_sub(source, state, sub_verbose, nested + 1)
File "C:UsersmreaAppDataLocalProgramsPythonPython37-32libsre_parse.py", line 426, in _parse_sub
not nested and not items))
File "C:UsersmreaAppDataLocalProgramsPythonPython37-32libsre_parse.py", line 819, in _parse
source.tell() - start)
re.error: missing ), unterminated subpattern at position 31 (line 3, column 17)

我在一行中尝试了所有这些,它抛出了此错误,所以我让它详细检查,我看不出出了什么问题。

我假设这是一些我还不知道的特定于 python 的语法内容。谁能帮忙?

这些是执行扩展正则表达式时的一些正则表达式字符串选项。

源代码中最容易阅读的是类型 3的三引号 ">">,但这需要转义字符串,就好像它是单引号一样,
这意味着
即使是转义也必须是奇数。

您可以使用以下公式执行此操作:
num_esc_to_add = (actual_num_escapes - 1)

例:

raw         :     \  :      \  :         \\  :           \\ :             \\\
quote   '   :    \  :    \\  :      \\\  :       \\\\ :        \\\\\'

====

=========================类型 1:

>>> import re
>>> expression1 = '     n
...   (                          # (1 start)     n
...        (                          # (2 start)     n
...             ([A-Za-z]:)                # (3)     n
...             (\){1,2}                  # (4)     n
...        )?                         # (2 end), group to catch optional drive volume     n
...        (                          # (5 start)     n
...             ([A-Za-z0-9_%~-])*        # (6), catch some letters/symbols     n
...             (\)                       # (7), catch one backslash     n
...             ([A-Za-z0-9_%~-])*        # (8), catch more letters/symbols     n
...        )+                         # (5 end), at least one of this group     n
...        (                          # (9 start)     n
...             .[a-zA-Z]{3,4}     n
...        )?                         # (9 end), catch optional file extension     n
...   )                          # (1 end)     n
... '
>>> Rx= re.compile(expression1, re.X)
>>> print(expression1)
(                          # (1 start)
(                          # (2 start)
([A-Za-z]:)                # (3)
(\){1,2}                  # (4)
)?                         # (2 end), group to catch optional drive volume
(                          # (5 start)
([A-Za-z0-9_%~-])*        # (6), catch some letters/symbols
(\)                       # (7), catch one backslash
([A-Za-z0-9_%~-])*        # (8), catch more letters/symbols
)+                         # (5 end), at least one of this group
(                          # (9 start)
.[a-zA-Z]{3,4}
)?                         # (9 end), catch optional file extension
)                          # (1 end)

类型 2:

>>> import re
>>> expression2 = "     n
...     (                          # (1 start)     n
...           (                          # (2 start)     n
...                ([A-Za-z]:)                # (3)     n
...                (\\){1,2}                  # (4)     n
...           )?                         # (2 end), group to catch optional drive volume     n
...           (                          # (5 start)     n
...                ([A-Za-z0-9_%~\-])*        # (6), catch some letters/symbols     n
...                (\\)                       # (7), catch one backslash     n
...                ([A-Za-z0-9_%~\-])*        # (8), catch more letters/symbols     n
...           )+                         # (5 end), at least one of this group     n
...           (                          # (9 start)     n
...                \.[a-zA-Z]{3,4}     n
...           )?                         # (9 end), catch optional file extension     n
...      )                          # (1 end)     n
... "
>>> Rx= re.compile(expression2, re.X)
>>> print(expression2)
(                          # (1 start)
(                          # (2 start)
([A-Za-z]:)                # (3)
(\){1,2}                  # (4)
)?                         # (2 end), group to catch optional drive volume
(                          # (5 start)
([A-Za-z0-9_%~-])*        # (6), catch some letters/symbols
(\)                       # (7), catch one backslash
([A-Za-z0-9_%~-])*        # (8), catch more letters/symbols
)+                         # (5 end), at least one of this group
(                          # (9 start)
.[a-zA-Z]{3,4}
)?                         # (9 end), catch optional file extension
)                          # (1 end)

类型 3:

>>> import re
>>> expression3 = """
...      (                          # (1 start)
...           (                          # (2 start)
...                ([A-Za-z]:)                # (3)
...                (\){1,2}                  # (4)
...           )?                         # (2 end), group to catch optional drive volume
...           (                          # (5 start)
...                ([A-Za-z0-9_%~-])*        # (6), catch some letters/symbols
...                (\)                       # (7), catch one backslash
...                ([A-Za-z0-9_%~-])*        # (8), catch more letters/symbols
...           )+                         # (5 end), at least one of this group
...           (                          # (9 start)
...                .[a-zA-Z]{3,4}
...           )?                         # (9 end), catch optional file extension
...      )                          # (1 end)
... """
>>> Rx= re.compile(expression3, re.X)
>>> print(expression3)
(                          # (1 start)
(                          # (2 start)
([A-Za-z]:)                # (3)
(\){1,2}                  # (4)
)?                         # (2 end), group to catch optional drive volume
(                          # (5 start)
([A-Za-z0-9_%~-])*        # (6), catch some letters/symbols
(\)                       # (7), catch one backslash
([A-Za-z0-9_%~-])*        # (8), catch more letters/symbols
)+                         # (5 end), at least one of this group
(                          # (9 start)
.[a-zA-Z]{3,4}
)?                         # (9 end), catch optional file extension
)                          # (1 end)

类型 4:

>>> import re
>>> expression4 = (
... r"          " + "n"
... r"     (                          # (1 start)     " + "n"
... r"          (                          # (2 start)     " + "n"
... r"               ([A-Za-z]:)                # (3)     " + "n"
... r"               (\){1,2}                  # (4)     " + "n"
... r"          )?                         # (2 end), group to catch optional drive volume     " + "n"
... r"          (                          # (5 start)     " + "n"
... r"               ([A-Za-z0-9_%~-])*        # (6), catch some letters/symbols     " + "n"
... r"               (\)                       # (7), catch one backslash     " + "n"
... r"               ([A-Za-z0-9_%~-])*        # (8), catch more letters/symbols     " + "n"
... r"          )+                         # (5 end), at least one of this group     " + "n"
... r"          (                          # (9 start)     " + "n"
... r"               .[a-zA-Z]{3,4}     " + "n"
... r"          )?                         # (9 end), catch optional file extension     " + "n"
... r"     )                          # (1 end)     " + "n"
... )
>>> Rx= re.compile(expression4, re.X)
>>> print(expression4)
(                          # (1 start)
(                          # (2 start)
([A-Za-z]:)                # (3)
(\){1,2}                  # (4)
)?                         # (2 end), group to catch optional drive volume
(                          # (5 start)
([A-Za-z0-9_%~-])*        # (6), catch some letters/symbols
(\)                       # (7), catch one backslash
([A-Za-z0-9_%~-])*        # (8), catch more letters/symbols
)+                         # (5 end), at least one of this group
(                          # (9 start)
.[a-zA-Z]{3,4}
)?                         # (9 end), catch optional file extension
)                          # (1 end)

最新更新