如何使用BeautifulSoup选择特定元素



我正试图在linkedIn上抓取一些配置文件信息。我遇到了一个具有这种布局的html结构,需要选择这个";奥贡州Abeokuta;只有和无视";合同";。

这是一个页面示例:https://www.linkedin.com/in/habibulah-oyero-44069a193/

html结构

<p class="pv-entity__secondary-title t-14 t-black t-normal">
Abeokuta, Ogun State
<span class="pv-entity__secondary-title separator">Contract</span>
</p>

python代码

from bs4 import BeautifulSoup
src = browser.page_source
soup = BeautifulSoup(src, "lxml")
experience_div = soup.find("section", {"id": "experience-section"})
job_div = experience_div.find("div", {"class": "pv-entity__summary-info pv-entity__summary-info--background-section"})
job_location = job_div.find("p", {"class": "pv-entity__secondary-title"}).text.strip()
print(job_location)
This returns:
Abeokuta, Ogun State
Contract

要只获取第一个标记,您可以使用.find_next()方法,该方法只返回第一个匹配:

from bs4 import BeautifulSoup

html = """<p class="pv-entity__secondary-title t-14 t-black t-normal">
Abeokuta, Ogun State
<span class="pv-entity__secondary-title separator">Contract</span>
</p>"""
soup = BeautifulSoup(html, "html.parser")
print(
soup.find("p", class_="pv-entity__secondary-title t-14 t-black t-normal")
.find_next(text=True)
.strip()
)

或者:您可以使用.contents:

print(
soup.find("p", class_="pv-entity__secondary-title t-14 t-black t-normal")
.contents[0]
.strip()
)

输出(在两种解决方案中(:

Abeokuta, Ogun State

最新更新