蟒蛇美丽汤打印包含字符串的多行中的特定行 - Python Beautiful Soup print specific lines within multiline containing string 小贝子编程网

如何在一个包含特定字符串的标记中仅获取/打印大的多行文本的行？在网站上，这些线路是用 标签实现的。没有关闭标记。

网站的基本结构：

<p style="line-height: 150%">
I need a big cup of coffee and cookies.
<br>
I do not like tea with milk.
<br>
I can't live without coffee and cookies.
<br>
...

假设我只想获取/打印包含"咖啡和饼干"字样的行。因此，在这种情况下，只应打印此的第一行和第三行/句子。

我在Python 3.7.1下安装了Beautiful Soup 4.6.3。

findAll似乎是以标签为导向的，并返回整个，对吗？那么我该如何实现呢？可能使用正则表达式或其他模式？

如果我能正确理解您的需求，那么下面的代码片段应该会让您达到目的：

from bs4 import BeautifulSoup
htmlelem = """
    <p style="line-height: 150%">
    I need a big cup of coffee and cookies.
    <br>
    I do not like tea with milk.
    <br>
    I can't live without coffee and cookies.
    <br>
"""
soup = BeautifulSoup(htmlelem, 'html.parser')
for paragraph in soup.find_all('p'):
    if not "coffee and cookies" in paragraph.text:continue
    print(paragraph.get_text(strip=True))

您能在上拆分吗？

from bs4 import BeautifulSoup
html = """
    <p style="line-height: 150%">
    I need a big cup of coffee and cookies.
    <br>
    I do not like tea with milk.
    <br>
    I can't live without coffee and cookies.
    <br>
"""
soup = BeautifulSoup(html, 'html.parser')
for item in soup.select('p'):
    r1 = item.text.split('n')
    for nextItem in r1:
        if "coffee and cookies" in nextItem:
            print(nextItem)

使用str()将bs4.element转换为字符串，然后可以将其与"咖啡和饼干"进行比较

from bs4 import BeautifulSoup
html_doc = """<p style="line-height: 150%">
    I need a big cup of coffee and cookies. <a href="aaa">aa</a>
    <br>
    I do not like tea with milk.
    <br>
    I can't live without coffee and cookies.
    <br>"""
soup = BeautifulSoup(html_doc, 'html.parser')
paragraph = soup.find('p')
for p in paragraph:
  if 'coffee and cookies' in str(p):
    next_is_a = p.find_next_sibling('a')
    if next_is_a:
      print(p.strip() + ' ' + str(next_is_a))
    else:
      print(p.strip())

蟒蛇美丽汤打印<p>包含字符串的多行中的特定行

相关内容

最新更新

热门标签：