beautifulsoup:在标签中删除文本

我正在尝试使用beautifulsoup从html文件中提取字符串。一个查询回复标签标签里面，我怎么能摆脱这些标签。

from bs4 import BeautifulSoup
import requests
with open('/Desktop/filename.html') as html_file:
soup = BeautifulSoup(html_file, 'lxml')
string = soup.find('div', class_="col-sm-8 col-xs-6")
print(string)

输出——

<div class="col-sm-8 col-xs-6">
Sherlock Holmes <br>
<label for="AgentAddress" style="display: none;">
Detective's Address
</label>
221B Baker Street London <br>
<label for="AgentCityStateZip" style="display: none;">
City, State, Zip
</label>
London, United Kingdom            
</div>

print(string.text)输出

Sherlock Holmes
Detective's Address
221B Baker Street London
City, State, Zip
London, United Kingdom

我对<label></label>标签内的文本不感兴趣，我怎样才能摆脱它们，使输出是-

Sherlock Holmes
221B Baker Street London
London, United Kingdom

您可以尝试分解，例如，在打印之前使用此:

for label_element in string.find_all("label"):
label_element.decompose()

相关内容

最新更新

热门标签：