Regex对狮身人面像画廊的解释



我正在调试涉及以下代码的sphinx库工具提示生成:

def extract_intro_and_title(filename, docstring):
"""Extract and clean the first paragraph of module-level docstring."""
# lstrip is just in case docstring has a 'nn' at the beginning
paragraphs = docstring.lstrip().split('nn')
# remove comments and other syntax like `.. _link:`
paragraphs = [p for p in paragraphs
if not p.startswith('.. ') and len(p) > 0]
if len(paragraphs) == 0:
raise ExtensionError(
"Example docstring should have a header for the example title. "
"Please check the example file:n {}n".format(filename))
# Title is the first paragraph with any ReSTructuredText title chars
# removed, i.e. lines that consist of (3 or more of the same) 7-bit
# non-ASCII chars.
# This conditional is not perfect but should hopefully be good enough.
title_paragraph = paragraphs[0]
match = re.search(r'^(?!([W _])1{3,})(.+)', title_paragraph,
re.MULTILINE)
if match is None:
raise ExtensionError(
'Could not find a title in first paragraph:n{}'.format(
title_paragraph))
title = match.group(0).strip()
# Use the title if no other paragraphs are provided
intro_paragraph = title if len(paragraphs) < 2 else paragraphs[1]
# Concatenate all lines of the first paragraph and truncate at 95 chars
intro = re.sub('n', ' ', intro_paragraph)
intro = _sanitize_rst(intro)
if len(intro) > 95:
intro = intro[:95] + '...'
return intro, title

我不明白的是:

match = re.search(r'^(?!([W _])1{3,})(.+)', title_paragraph,
re.MULTILINE)

有人能给我解释一下吗?

启动:

>>> import re
>>> help(re.search)
Help on function search in module re:
search(pattern, string, flags=0)
Scan through string looking for a match to the pattern, returning
a Match object, or None if no match was found.
(END)

这告诉我们re.search采用一个模式、一个字符串和默认为0的可选标志。

这本身可能没有多大帮助。

正在传递的标志是re.MULTILINE。这告诉正则表达式引擎将^$作为每行的开始和结束。默认情况下,这些应用于字符串的开头和结尾,而不管字符串由多少行组成。

正在匹配的模式正在寻找以下内容:

^-图案必须从每行的开头开始

(?!([W _])1{3,})-前四个字符不能是:非单词字符(W(、空格((或下划线(_(。这是使用与括号中的字符组(([W _])(匹配的负前瞻((?!)(,即捕获组1。此匹配必须重复3次或更多次(1{3,}(。1发信号通知捕获组1的内容,并且{3,}表示至少3次。匹配加上3次重复的匹配强制前4个字符不能是重复的非单词字符。此匹配不使用任何字符,仅在条件为true时匹配一个位置。

作为旁注,Ww匹配,后者是[A-Za-z0-9_]的简写。这意味着W[^A-Za-z0-9_]的简写

(.+)-如果上一个位置匹配成功,如果行由1个或多个字符组成,则捕获组2中的整行都将匹配。

https://regex101.com/r/3p73lf/1探究正则表达式的行为。

相关内容

  • 没有找到相关文章

最新更新