我的 df
数据有两个列
thePerson theText
"the abc" "this is about the abc"
"xyz" "this is about tyu"
"wxy" "this is about abc"
"wxy" "this is about WXY"
我希望结果df
为
thePerson theText
"the abc" "this is about <b>the abc</b>"
"xyz" "this is about tyu"
"wxy" "this is about abc"
"wxy" "this is about <b>WXY</b>"
请注意,如果同一行中的theText包含theperson,则在thetext中变得大胆。
我未能成功尝试的解决方案之一是:
df['theText']=df['theText'].replace(df.thePerson,'<b>'+df.thePerson+'</b>', regex=True)
我想知道我是否可以使用lapply
或map
我的Python环境设置为2.7
使用re.sub
和zip
tt = df.theText.values.tolist()
tp = df.thePerson.str.strip('"').values.tolist()
df.assign(
theText=[re.sub(r'({})'.format(p), r'<b>1</b>', t, flags=re.I)
for t, p in zip(tt, tp)]
)
thePerson theText
0 the abc this is about <b>the abc</b>
1 xyz this is about tyu
2 wxy this is about abc
3 wxy this is about <b>WXY</b>
复制/粘贴
您应该能够运行此确切的代码并获得所需的结果
from io import StringIO
import pandas as pd
txt = '''thePerson theText
"the abc" "this is about the abc"
"xyz" "this is about tyu"
"wxy" "this is about abc"
"wxy" "this is about WXY"'''
df = pd.read_csv(StringIO(txt), sep='s{2,}', engine='python')
tt = df.theText.values.tolist()
tp = df.thePerson.str.strip('"').values.tolist()
df.assign(
theText=[re.sub(r'({})'.format(p), r'<b>1</b>', t, flags=re.I)
for t, p in zip(tt, tp)]
)
您应该看到此
thePerson theText
0 "the abc" "this is about <b>the abc</b>"
1 "xyz" "this is about tyu"
2 "wxy" "this is about abc"
3 "wxy" "this is about <b>WXY</b>"
您可以使用apply
:
df['theText'] = df.apply(lambda x: re.sub(r'('+x.thePerson+')',
r'<b>1</b>',
x.theText,
flags=re.IGNORECASE), axis=1)
print (df)
thePerson theText
0 the abc this is about <b>the abc</b>
1 xyz this is about tyu
2 wxy this is about abc
3 wxy this is about <b>WXY</b>