beautifulsoup:在标签中删除文本



我正在尝试使用beautifulsoup从html文件中提取字符串。一个查询回复标签标签里面,我怎么能摆脱这些标签。

from bs4 import BeautifulSoup
import requests
with open('/Desktop/filename.html') as html_file:
soup = BeautifulSoup(html_file, 'lxml')
string = soup.find('div', class_="col-sm-8 col-xs-6")
print(string)

输出——

<div class="col-sm-8 col-xs-6">
Sherlock Holmes <br>
<label for="AgentAddress" style="display: none;">
Detective's Address
</label>
221B Baker Street London <br>
<label for="AgentCityStateZip" style="display: none;">
City, State, Zip
</label>
London, United Kingdom            
</div>

print(string.text)输出

Sherlock Holmes
Detective's Address
221B Baker Street London
City, State, Zip
London, United Kingdom 

我对<label></label>标签内的文本不感兴趣,我怎样才能摆脱它们,使输出是-

Sherlock Holmes
221B Baker Street London
London, United Kingdom 

您可以尝试分解,例如,在打印之前使用此:

for label_element in string.find_all("label"):
label_element.decompose()

最新更新