Python正则表达式将段落中的特定行替换为正则表达式,而不是整个文件


s="""Paragraph 1
some text blah blah
blah blah
UNWANTED TEXT
some text
Paragraph END
UNWNTED TEXT
Paragraph 2
some text blah blah
blah blah
UNWNTED TEXT
Paragraph END"""

现在python代码到re.sub,只在段落内部替换UNWANTED TEXT,在段落外部保留UNWANTED TEXT

search_unwanted_only_inparagrap = re.findall('(?s)(?<=Paragraph)(.*?)(?=END)', text_file, flags = re.MULTILINE )
if search_unwanted_only_inparagrap:
replace_only_insidepara = re.sub(r"UNWANTED TEXT+", " ", text_file)   #replace string substitue
print (replace_only_insidepara)
else:
print ("not found")    

但是输出替换了文件中所有UNWANTED TEXT的实例

Paragraph 1
some text blah blah
blah blah
some text
Paragraph END

Paragraph 2
some text blah blah
blah blah
Paragraph END

但我希望像这个

Paragraph 1
some text blah blah
blah blah
some text
Paragraph END
UNWNTED TEXT
Paragraph 2
some text blah blah
blah blah
Paragraph END

请帮忙。

您的演示输入应该更加"最少"。然而,我试图理解你的要求,我尝试了重新拆分作品:

import re
s = """Paragraph 1
some text blah blah
blah blah
UNWANTED TEXT
some text
Paragraph END
UNWANTED TEXT
Paragraph 2
some text blah blah
blah blah
UNWANTED TEXT
Paragraph END"""
reg_para = re.compile(r'(Paragraphs+d+.+?END)', re.DOTALL)
paras = reg_para.split(s)
for para in paras:
if reg_para.match(para):
para = re.sub(r"UNWANTED TEXT", " ", para)
#  in case you want replace more words:
#  of course you can use list of keywords some loops
para = re.sub(r"Another WORD", " ", para)
print(para)
else:
print(para)

输出:

Paragraph 1
some text blah blah
blah blah

some text
Paragraph END

UNWANTED TEXT

Paragraph 2
some text blah blah
blah blah

Paragraph END

最新更新