如何使用正则表达式来捕获以单个分号开头的行之间发生的文本?



我想捕获以单分号开头的行之间的文本:

示例输入:

s = '''
;
the color blue
;
the color green
;
the color red
;
'''

这是所需的输出:

['the color blue', 'the color green', 'the color red']

此尝试的解决方案不起作用:

import re
pat = r'^;(.*)^;'
r = re.findall(pat, s, re.S|re.M)
print(r)

这是错误的输出:

['nnthe color bluenn;nnthe color greennn;nnthe color rednn']

将其视为分隔符。

(?sm)^;s*r?n(.*?)s*(?=^;s*r?n)

https://regex101.com/r/4tKX0F/1

解释

(?sm)                         # Modifiers: dot-all, multi-line
^ ; s* r? n                # Begining delimiter
( .*? )                       # (1), Text 
s*                           # Wsp trim
(?= ^ ; s* r? n )          # End delimiter

非正则表达式解决方案,我在;上拆分并删除空字符串

s = '''
;
the color blue

;
the color green
;
the color red
;
'''
f = s.split(';')

x = [a.strip('n') for a in f]
print(x) #prints ['', 'the color blue', 'the color green', 'the color red', '']
a = [elem for elem in x if len(elem)]
print(a) #prints ['the color blue', 'the color green', 'the color red']

你可以把它作为模式:

pat = r';nn([w* *]*)'
r = re.findall(pat, s)

这应该捕获您需要的内容。

你知道你没有要求这个。但值得考虑将pyparsing作为re的替代方案。事实上,pyparing 正确地包含正则表达式。请注意这个简单的解析器如何应对各种数量的空行。

>>> parsifal = open('temp.txt').read()
>>> print (parsifal)

;
the colour blue
;
the colour green
;
the colour red
;
the colour purple


;
the colour magenta
;

>>> import pyparsing as pp
>>> p = pp.OneOrMore(pp.Suppress(';n')+pp.ZeroOrMore(pp.Suppress('n'))+pp.CharsNotIn(';n')+pp.ZeroOrMore(pp.Suppress('n')))
>>> p.parseString(parsifal)
(['the colour blue', 'the colour green', 'the colour red', 'the colour purple', 'the colour magenta'], {})

总体而言,解析器匹配OneOrMore分号或换行符序列,后跟除这些字符以外的任何字符,后跟换行符。

您可以使用;s*(.*?)s*(?=;).用法:

print( re.findall(r'(?s);s*(.*?)s*(?=;)', s) )
# output: ['the color blue', 'the color green', 'the color red']

解释:

(?s)   # dot-all modifier (. matches newlines)
;      # consume a semicolon
s*    # skip whitespace
(.*?)  # capture the following text, as little as possible, such that...
s*    # ... it is followed only by (optional) whitespace, and...
(?=;)  # ... a semicolon

相关内容

最新更新