查找并替换包含标记的实例

我使用Python 3.7.9，我有一些HTML代码，其中包括一些数据从熊猫表。我想对pandas表中的特定数据进行着色，因此我想重用字符串标记之间的文本，并将其替换为一些其他标记(它们在Confluence中用于以特定颜色标记文本)

我的输入文本字符串是:

text = 'some text now important information starts decrease-123456decrease more text not to touch next marker increase7896278689increase and more text another marker decrease-12355decrease with important information'

替换字符串为:

increase = '<span style="color: Red;">'+val+'</span>'
decrease = '<span style="color: Green;">'+val+'</span>'

和val是要在标记之间找到的信息。

所以我的期望输出是:

output = some text now important information starts <span style="color: Green;">-123456</span> more text not to touch next marker <span style="color: Red;">7896278689</span> and more text another marker <span style="color: Green;">-12355</span> with important information

这是我尝试的:

import re
text = 'some text now important information starts decrease-123456decrease more text not to touch next marker increase7896278689increase and more text another marker decrease-12355decrease with important information'
found_increase = re.findall('increase(.+?)increase', text)
found_decrease = re.findall('decrease(.+?)decrease',text)
output=''
for i, val in enumerate(found_increase):
output=text.replace('increase'+val+'increase', '<span style="color: Red;">'+val+'</span>')
for i, val in enumerate(found_decrease):
output=text.replace('decrease'+val+'decrease', '<span style="color: Green;">'+val+'</span>')
print(output)

我也尝试过pandas附带的样式方法，但Confluence不是真正的HTML，因此这种方法对我不起作用。在上面的示例中，我得到以下输出:

Some text now important information starts decrease-123456decrease more text not to touch next marker increase7896278689increase and more text another marker <span style="color: Green;">-12355</span> with important information

python regex引擎直接支持通过捕获组和re.sub/re.Pattern.sub进行替换。默认是替换所有出现的模式。

https://docs.python.org/3/library/re.html re.sub

访问第一个捕获组的模式分别是r'1'或'\1'

import re
text = 'some text now important information starts decrease-123456decrease more text not to touch next marker increase7896278689increase and more text another marker decrease-12355decrease with important information'
inc_replaced = re.sub('increase(.+?)increase', '<span style="color: Red;">\1</span>', text)
output = re.sub('decrease(.+?)decrease', '<span style="color: Green;">\1</span>', text)

>>> output                                                                                                                                                                                                                                
'some text now important information starts <span style="color: Green;">-123456</span> more text not to touch next marker increase7896278689increase and more text another marker <span style="color: Green;">-12355</span> with important information'

我发现下面的代码可以正常工作:

print(re.sub(r"decrease(.*?)decrease", r"<span style="color: Green;">1</span>", test))

这里的情况是我们替换了模式

"decrease(.*?)decrease"

"<span style="color: Green;">1</span>"

，其中1为(.*?)的内容。注意字符串前的前导r。你可以在这里了解为什么会这样。

显然，您也需要为增加版本重新创建这个。

注意replace()将替换所有的出现，看起来你的代码没有考虑到这一点。

相关内容

最新更新

热门标签：