我使用Python 3.7.9,我有一些HTML代码,其中包括一些数据从熊猫表。我想对pandas表中的特定数据进行着色,因此我想重用字符串标记之间的文本,并将其替换为一些其他标记(它们在Confluence中用于以特定颜色标记文本)
我的输入文本字符串是:
text = 'some text now important information starts decrease-123456decrease more text not to touch next marker increase7896278689increase and more text another marker decrease-12355decrease with important information'
替换字符串为:
increase = '<span style="color: Red;">'+val+'</span>'
decrease = '<span style="color: Green;">'+val+'</span>'
和val是要在标记之间找到的信息。
所以我的期望输出是:
output = some text now important information starts <span style="color: Green;">-123456</span> more text not to touch next marker <span style="color: Red;">7896278689</span> and more text another marker <span style="color: Green;">-12355</span> with important information
这是我尝试的:
import re
text = 'some text now important information starts decrease-123456decrease more text not to touch next marker increase7896278689increase and more text another marker decrease-12355decrease with important information'
found_increase = re.findall('increase(.+?)increase', text)
found_decrease = re.findall('decrease(.+?)decrease',text)
output=''
for i, val in enumerate(found_increase):
output=text.replace('increase'+val+'increase', '<span style="color: Red;">'+val+'</span>')
for i, val in enumerate(found_decrease):
output=text.replace('decrease'+val+'decrease', '<span style="color: Green;">'+val+'</span>')
print(output)
我也尝试过pandas附带的样式方法,但Confluence不是真正的HTML,因此这种方法对我不起作用。在上面的示例中,我得到以下输出:
Some text now important information starts decrease-123456decrease more text not to touch next marker increase7896278689increase and more text another marker <span style="color: Green;">-12355</span> with important information
python regex引擎直接支持通过捕获组和re.sub
/re.Pattern.sub
进行替换。默认是替换所有出现的模式。
https://docs.python.org/3/library/re.html re.sub
访问第一个捕获组的模式分别是r'1'
或'\1'
import re
text = 'some text now important information starts decrease-123456decrease more text not to touch next marker increase7896278689increase and more text another marker decrease-12355decrease with important information'
inc_replaced = re.sub('increase(.+?)increase', '<span style="color: Red;">\1</span>', text)
output = re.sub('decrease(.+?)decrease', '<span style="color: Green;">\1</span>', text)
>>> output
'some text now important information starts <span style="color: Green;">-123456</span> more text not to touch next marker increase7896278689increase and more text another marker <span style="color: Green;">-12355</span> with important information'
我发现下面的代码可以正常工作:
print(re.sub(r"decrease(.*?)decrease", r"<span style="color: Green;">1</span>", test))
这里的情况是我们替换了模式
"decrease(.*?)decrease"
"<span style="color: Green;">1</span>"
,其中1
为(.*?)
的内容。注意字符串前的前导r
。你可以在这里了解为什么会这样。
显然,您也需要为增加版本重新创建这个。
注意replace()
将替换所有的出现,看起来你的代码没有考虑到这一点。