字符串的正则表达式模式 - python - Regex pattern for string

我想以这种格式对字符串进行分组：

Some_text Some_text 1 2 3
BEGIN Some_text Some_text
44 76 1321
Some_text Some_text
END Some_text
Some_Text Some_text 1 4 5

我想从开始到结束对它进行分组，如下所示：

Some_text Some_text 1 2 3
<!-- START -->
BEGIN Some_text Some_text
44 76 1321
Some_text Some_text
END <!-- END --> Some_text
Some_Text Some_text 1 4 5

和- 这只是对分组开始和结束的评论。我只想在开始和结束之间获取文本

我有这样的东西，但它并不适用于每种情况 - 当有很多数据时，它只是不起作用：

reg = re.compile(rf"{begin}[-s]+(.*)n{end}", re.DOTALL)
core = re.search(reg, text).group(1)
lines = core.split("n")

text是我的字符串，然后在分组后我将其交换为列表 - 我不知道如何直接从列表中制作此正则表达式，那么我就不必在字符串文本上执行此操作，而是在python列表文本上执行此操作

给我一些提示或帮助我如何解决它。

示例代码：

import re
text="Some_text Some_text 1 2 3nBEGIN Some_text Some_textn44 76 1321nSome_text Some_textnEND Some_textnSome_Text Some_text 1 4 5"
begin = "BEGIN"
end = "END"
reg = re.compile(rf"{begin}[-s]+(.*)n{end}", re.DOTALL)
core = re.search(reg, text).group(1)
lines = core.split("n")
print(lines)

它有效，但我不知道为什么有时它不起作用，当它需要大量文本时，例如：20k 字我只想在开始和结束之间获取文本

你可以使用

^BEGINb(.*(?:r?n(?!(?:BEGIN|END)b).*)*)r?nEND

正则表达式演示 |蟒蛇演示

如果要包括 BEGIN 和 END，可以省略捕获组

^BEGINb.*(?:r?n(?!(?:BEGIN|END)b).*)*r?nEND

正则表达式演示 |蟒蛇演示

代码示例

import re
regex = r"^BEGINb(.*(?:r?n(?!(?:BEGIN|END)b).*)*)r?nEND"
test_str = ("Some_text Some_text 1 2 3n"
"BEGIN Some_text Some_textn"
"44 76 1321n"
"Some_text Some_textn"
"END Some_textn"
"Some_Text Some_text 1 4 5n")
print(re.findall(regex, test_str, re.MULTILINE))

输出

[' Some_text Some_textn44 76 1321nSome_text Some_text']

这有效：

txt='''
Some_text Some_text 1 2 3
BEGIN Some_text Some_text
44 76 1321
Some_text Some_text
END Some_text
Some_Text Some_text 1 4 5'''
import re
print(re.sub(r'(?=BEGIN )(.*END)',r'<!-- START -->n1 <!-- END -->',txt,flags=re.S))

或

print(re.sub(r'(?=^BEGIN )([sS]*END)',r'<!-- START -->n1 <!-- END -->',txt, flags=re.M))

任一打印：

Some_text Some_text 1 2 3
<!-- START -->
BEGIN Some_text Some_text
44 76 1321
Some_text Some_text
END <!-- END --> Some_text
Some_Text Some_text 1 4 5

这使用非贪婪模式来匹配从开始标记到结束标记的所有内容，包括标记。正则表达式模式中的b是为了确保 BEGIN 和 END 不是较长单词的一部分，例如，因此"BEGIN"不会与"BEGINS"或"BEGINNING"匹配。注意：对于标记不匹配的输入，例如"a b c BEGIN d e BEGIN 1 2 END 3"(两个开始(，它可能无法正常工作。

import re
txt='''
Some_text Some_text 1 2 3
BEGIN Some_text Some_text
44 76 1321
Some_text Some_text
END Some_text
Some_Text Some_text 1 4 5'''
begin = 'BEGIN'
end = 'END'
regex = re.compile(rf"(?<=b{begin}b)(.*?)(?=b{end}b)", flags=re.DOTALL)
match = regex.search(txt)
if match:
print(match[1])

字符串的正则表达式模式 - python

相关内容

最新更新

热门标签：