Python Beautiful Soup -查找包含特殊字符的字符串

下面是我的代码:

soup = BeautifulSoup("<html><body>BLAR fff11 &pound; </body></html>", 'html.parser')
for z in soup.find_all(text=re.compile('&pound;')):
print(z)

由于某些原因，它不返回任何东西，但是，如果我改变示例html和find语句中的特殊字符，它就可以工作了:

soup = BeautifulSoup("<html><body>BLAR fff11 pound </body></html>", 'html.parser')
for z in soup.find_all(text=re.compile('pound')):
print(z)

输出为:BLAR fff11磅

有谁知道我哪里出错了，我怎么能找到特殊字符的字符串?

感谢

从HTML构造一个BeautifulSoup对象时，HTML实体被转换为相应的Unicode字符。

所以要搜索这样的字符，使用字符本身，而不是它的HTML实体等效。使用示例中的HTML，下面的代码…

from bs4 import BeautifulSoup
import re
soup = BeautifulSoup("<html><body>BLAR fff11 &pound; </body></html>", 'html.parser')
for z in soup.find_all(text=re.compile('£')):  # Actual '£' character, not '&pound;'
print(z)

…打印:

BLAR fff11 £

在BeautifulSoup v3中可以绕过这种转换，但在v4中不可以("输入的HTML或XML实体总是被转换为相应的Unicode字符。")

如果您希望在将BeautifulSoup对象转换为字符串时返回HTML实体，则仍然可以通过指定formatter="html"来实现。

相关内容

最新更新

热门标签：