如何提取<p>匹配的特定文本，考虑到它也具有<b>？

我怎样才能找到一个段落HTML元素里面有一个粗体元素?粗体元素变化。可以是michael，然后是Luis。

from bs4 import BeautifulSoup
import re
html = "<p>Hello<b>Michael</b></p>"
# it could be  "<p>Hello<b>Luis</b></p>"
soup = BeautifulSoup(html, 'html.parser')
testing = soup.find('p', text=re.compile(r"Hello"))

测试变量返回None。

考虑到html有大量的段落，我不能简单地做一个soup.find_all("p")

从Beautiful Soup 4.7.0+到css selectors和pseudo-classes,Soup Sieve作为官方CSS选择。

所以你可以简单地将条件链接到:

soup.select('p:has(b):-soup-contains("Hello")')

from bs4 import BeautifulSoup
html='''
<p>Hello<b>Michael</b></p>
<p>some other p tags</p>
<p>some other p tags</p>
<p>Hello<b>Luis</b></p>
<p>some other p tags</p>
<p>some other p tags</p>
<p>Not matching<b>Luis</b></p>
'''
soup = BeautifulSoup(html)
soup.select('p:has(b):-soup-contains("Hello")')

[<p>Hello<b>Michael</b></p>, <p>Hello<b>Luis</b></p>]

`[<p>Hello<b>Michael</b></p>, <p>Hello<b>Luis</b></p>]`

相关内容

最新更新

热门标签：

如何提取<p>匹配的特定文本，考虑到它也具有<b>？

from bs4 import BeautifulSoup html=''' <p>Hello<b>Michael</b></p> <p>some other p tags</p> <p>some other p tags</p> <p>Hello<b>Luis</b></p> <p>some other p tags</p> <p>some other p tags</p> <p>Not matching<b>Luis</b></p> ''' soup = BeautifulSoup(html) soup.select('p:has(b):-soup-contains("Hello")')

[<p>Hello<b>Michael</b></p>, <p>Hello<b>Luis</b></p>]

相关内容

最新更新

热门标签：

`[<p>Hello<b>Michael</b></p>, <p>Hello<b>Luis</b></p>]`