基于正则表达式模式突出显示数据帧中的文本


  1. 问题:我有一个用例,要求我在基于regex模式的数据帧行中突出显示具有红色字体的单词。我发现了一个正则表达式模式,因为它忽略了所有空格、标点符号和区分大小写
  2. 来源:原始来源来自csv文件。所以我想把它加载到一个数据帧中,进行模式匹配突出显示格式化,并在excel上输出
  3. 代码:该代码帮助我计算数据帧行中匹配的单词数
import pandas as pd
import re
df = pd.read_csv("C:/filepath/filename.csv", engine='python')
p = r'(?i)(?<![^ .,?!-])Crust|good|selection|fresh|rubber|warmer|fries|great(?!-[^ .,?!;rn])'
df['Output'] =  df['Output'].apply(lambda x: re.sub(p, red_fmt.format(r"g<0>"), x))
  1. 样本数据:
输入
哇。。。喜欢这个地方
地壳不好
菜单上的选择很棒,价格也很棒
Honeslty味道没那么新鲜
土豆就像橡胶一样,你可以看出它们是提前做好的,放在温暖的环境下
薯条也很棒
import re
# Console output color.
red_fmt = "33[1;31m{}33[0m"
s = """
Wow... Loved this place.
Crust is not good.
The selection on the menu was great and so were the prices.
Honeslty it didn't taste THAT fresh.
The potatoes were like rubber and you could tell they had been made up ahead of time being kept under a warmer.
The fries were great too.
"""
p = r'(?i)(?<![^ rn.,?!-])Crust|good|selection|fresh|rubber|warmer|fries|great(?!-[^ .,?!;rn])'

print(re.sub(p, red_fmt.format(r"g<0>"), s))

最新更新