小贝子编程

防止美丽汤的find_all()转换转义的html标签

本文关键字：转换转义标签 html 美丽 find all python-3.x beautifulsoup escaping
更新时间 : 2023-09-18
英文 : prevent BeautifulSoup's find_all() from converting escaped html tags

我有一些文本：

text = test 

我读了美丽的汤4:

soup = BeautifulSoup(text, "html.parser") # soup: test 

然后我想获得文本节点：

text_nodes = soup.find_all(text=True)

但是转义的HTML在这个过程中被取消转义：text_nodes: ['test']

如何防止find_all()步骤转换转义的HTML标记？

对于text=True，我认为没有保持字符串原样的选项。

我的解决方案只是用循环逃避结果

from bs4 import BeautifulSoup
from html import escape
text = '<p>&lt;b&gt;test&lt;/b&gt;<br/></p>'
soup = BeautifulSoup(text, "html.parser")
text_nodes = [escape(x) for x in soup.strings]
print(text_nodes)
# ['&lt;b&gt;test&lt;/b&gt;']

soup.strings是soup.find_all(text=True)的较短版本。

防止美丽汤的find_all()转换转义的html标签

相关内容

最新更新

热门标签：