re.findall 在被贪婪的通配符包围时在管道中播放最爱



我在尝试使用|运算符匹配多个模式时无法理解 Python 3.8 如何处理正则表达式。 似乎正则表达式"播放收藏夹",并且在决定捕获什么时只选择某些模式而不是其他模式。我认为这与模式的排序方式有关,但顺序似乎对返回的内容或返回的顺序没有影响。显然它在通配符中的某个地方,但对我来说仍然没有意义。

以下(高度抽象的(脚本显示了我不理解事物的地方:

import re
sentence = 'This is neither a long nor a difficult string to parse'
# Scenario 1: no patterns at the beginning"
long_first = re.compile(r'(long|difficult)')
difficult_first = re.compile(r'(difficult|long)')
print(re.findall(long_first, sentence)) # -> ['long', 'difficult']
print(re.findall(difficult_first, sentence)) # -> ['long', 'difficult']
# Scenario 2: .+ at the beginning
long_first = re.compile(r'.+(long|difficult)')
difficult_first = re.compile(r'.+(difficult|long)')
print(re.findall(long_first, sentence)) # -> ['difficult']
print(re.findall(difficult_first, sentence)) # -> ['difficult']
# Scenario 3: .+ at the end
long_first = re.compile(r'(long|difficult).+')
difficult_first = re.compile(r'(difficult|long).+')
print(re.findall(long_first, sentence)) # -> ['long']
print(re.findall(difficult_first, sentence)) # -> ['long']
# Scenario 4: .+ at the end and the beginning
long_first = re.compile(r'.+(long|difficult).+')
difficult_first = re.compile(r'.+(difficult|long).+')
print(re.findall(long_first, sentence)) # -> ['difficult']
print(re.findall(difficult_first, sentence)) # -> ['difficult']

为什么结果不一致['long', 'difficult']?为什么场景 3 不返回['difficult']

匹配从左到右进行,因此在模式.+(long|difficult).+中,前.+尽可能多地匹配,同时仍允许表达式的其余部分匹配。

这意味着模式.+(long|difficult).+找到句子中出现两个单词之一的最后一个位置。

为了更好地理解它,还可以尝试句子上的模式:

'This is neither a difficult nor a long string to parse'

最新更新