我有一个html数据,我只想提取出现在粗体字体下的文本。
<span style="font-family: ABCDEE+Cambria,Bold; font-size:9px">Pinecone Functions
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:419px; top:1903px; width:76px; height:11px;"><span style="font-family: ABCDEE+Calibri,Bold; font-size:7px">Trainee Sign-Off
<br></span></div>
我只想要在字体家庭下的文本:Abcdee Cambria,Bold。
with open('/home/output4.html') as file:
text = file.read()
soup = BeautifulSoup(text, 'html.parser')
x = soup.find_all('span', style=re.compile(r'font-family: ABCDEE+Cambria,Bold.*'))
for rows in x:
print(rows.text)
我尝试了此BT获取空列表。
+
是以至于以下的特殊字符,您应该逃脱它(请注意+
而不是+
(
示例:
from bs4 import BeautifulSoup
import re
text = """
<span style="font-family: ABCDEE+Cambria,Bold; font-size:9px">Pinecone Functions
<br></span></div><div style="position:absolute; border: textbox 1px solid; writing-mode:lr-tb; left:419px; top:1903px; width:76px; height:11px;"><span style="font-family: ABCDEE+Calibri,Bold; font-size:7px">Trainee Sign-Off
<br></span></div>
"""
soup = BeautifulSoup(text, 'html.parser')
x = soup.find_all('span', style=re.compile(r'font-family: ABCDEE+Cambria,Bold.*'))
for rows in x:
print(rows.text)
输出:
松果功能