BS4 获取文本函数产生意外输出



以下 html 示例根据文本样式格式生成不同的结果 这是在一行上的示例

card = """
<ul class="wrapper--inline-block float--left margin-top--15 padding-left--20 font--weight-300"><li><span class="font--weight-500">Minimum Qualification:</span> Bachelor</li><li><span class="font--weight-500">Experience Level:</span> Graduate trainee</li><li><span class="font--weight-500">Experience Length:</span> 1 year</li></ul>
"""

输出:

Minimum Qualification: BachelorExperience Level: Graduate traineeExperience Length: 1 year

当 html 示例格式化时

card = """
<ul class="wrapper--inline-block float--left margin-top--15 padding-left--20 font--weight-300">
<li><span class="font--weight-500">Minimum Qualification:</span> Bachelor</li>
<li><span class="font--weight-500">Experience Level:</span> Graduate trainee</li>
<li><span class="font--weight-500">Experience Length:</span> 1 year</li>
</ul>
"""

输出

Minimum Qualification: Bachelor
Experience Level: Graduate trainee
Experience Length: 1 year

问题是,如何使第一种情况产生所需的输出,就像第二种情况一样。 这是我当前的代码

qualifications=  BeautifulSoup(card, "html.parser")
print(qualifications.getText())

使用separator="n"获得所需的输出,

qualifications.getText(separator="n")

编辑-1:

>>> card = """
<ul class="wrapper--inline-block float--left margin-top--15 padding-left--20 font--weight-300"><li><span class="font--weight-500">Minimum Qualification:</span> Bachelor</li><li><span class="font--weight-500">Experience Level:</span> Graduate trainee</li><li><span class="font--weight-500">Experience Length:</span> 1 year</li></ul>
"""
>>> qualifications=  BeautifulSoup(card, "html.parser")
>>> for li in qualifications.find_all('li'):
print(li.get_text()) 
Minimum Qualification: Bachelor
Experience Level: Graduate trainee
Experience Length: 1 year

我猜 bs4 的情况以与获取内容相同的方式打印内容(在一行或另一行中(。但是对于您的特定问题,您可以先找到<li>标签,然后打印其内容。每个元素的全部内容将打印在不同的行中。

qualifications=  BeautifulSoup(card, "html.parser")
soup = qualifications.findAll('li')
for i in soup:
print(i.getText())

你会得到这个:

Minimum Qualification: Bachelor
Experience Level: Graduate trainee
Experience Length: 1 year

最新更新