我正在调试涉及以下代码的sphinx库工具提示生成:
def extract_intro_and_title(filename, docstring):
"""Extract and clean the first paragraph of module-level docstring."""
# lstrip is just in case docstring has a 'nn' at the beginning
paragraphs = docstring.lstrip().split('nn')
# remove comments and other syntax like `.. _link:`
paragraphs = [p for p in paragraphs
if not p.startswith('.. ') and len(p) > 0]
if len(paragraphs) == 0:
raise ExtensionError(
"Example docstring should have a header for the example title. "
"Please check the example file:n {}n".format(filename))
# Title is the first paragraph with any ReSTructuredText title chars
# removed, i.e. lines that consist of (3 or more of the same) 7-bit
# non-ASCII chars.
# This conditional is not perfect but should hopefully be good enough.
title_paragraph = paragraphs[0]
match = re.search(r'^(?!([W _])1{3,})(.+)', title_paragraph,
re.MULTILINE)
if match is None:
raise ExtensionError(
'Could not find a title in first paragraph:n{}'.format(
title_paragraph))
title = match.group(0).strip()
# Use the title if no other paragraphs are provided
intro_paragraph = title if len(paragraphs) < 2 else paragraphs[1]
# Concatenate all lines of the first paragraph and truncate at 95 chars
intro = re.sub('n', ' ', intro_paragraph)
intro = _sanitize_rst(intro)
if len(intro) > 95:
intro = intro[:95] + '...'
return intro, title
我不明白的是:
match = re.search(r'^(?!([W _])1{3,})(.+)', title_paragraph,
re.MULTILINE)
有人能给我解释一下吗?
启动:
>>> import re
>>> help(re.search)
Help on function search in module re:
search(pattern, string, flags=0)
Scan through string looking for a match to the pattern, returning
a Match object, or None if no match was found.
(END)
这告诉我们re.search
采用一个模式、一个字符串和默认为0的可选标志。
这本身可能没有多大帮助。
正在传递的标志是re.MULTILINE
。这告诉正则表达式引擎将^
和$
作为每行的开始和结束。默认情况下,这些应用于字符串的开头和结尾,而不管字符串由多少行组成。
正在匹配的模式正在寻找以下内容:
^
-图案必须从每行的开头开始
(?!([W _])1{3,})
-前四个字符不能是:非单词字符(W
(、空格((或下划线(
_
(。这是使用与括号中的字符组(([W _])
(匹配的负前瞻((?!
…)
(,即捕获组1。此匹配必须重复3次或更多次(1{3,}
(。1
发信号通知捕获组1的内容,并且{3,}
表示至少3次。匹配加上3次重复的匹配强制前4个字符不能是重复的非单词字符。此匹配不使用任何字符,仅在条件为true时匹配一个位置。
作为旁注,W
与w
匹配,后者是[A-Za-z0-9_]
的简写。这意味着W
是[^A-Za-z0-9_]
的简写
(.+)
-如果上一个位置匹配成功,如果行由1个或多个字符组成,则捕获组2中的整行都将匹配。
https://regex101.com/r/3p73lf/1探究正则表达式的行为。