如何使用BeautifulSoup选择特定元素

我正试图在linkedIn上抓取一些配置文件信息。我遇到了一个具有这种布局的html结构，需要选择这个"；奥贡州Abeokuta；只有和无视"；合同"；。

这是一个页面示例：https://www.linkedin.com/in/habibulah-oyero-44069a193/

html结构

<p class="pv-entity__secondary-title t-14 t-black t-normal">
Abeokuta, Ogun State
<span class="pv-entity__secondary-title separator">Contract</span>
</p>

python代码

from bs4 import BeautifulSoup
src = browser.page_source
soup = BeautifulSoup(src, "lxml")
experience_div = soup.find("section", {"id": "experience-section"})
job_div = experience_div.find("div", {"class": "pv-entity__summary-info pv-entity__summary-info--background-section"})
job_location = job_div.find("p", {"class": "pv-entity__secondary-title"}).text.strip()
print(job_location)
This returns:
Abeokuta, Ogun State
Contract

要只获取第一个标记，您可以使用.find_next()方法，该方法只返回第一个匹配：

from bs4 import BeautifulSoup

html = """<p class="pv-entity__secondary-title t-14 t-black t-normal">
Abeokuta, Ogun State
<span class="pv-entity__secondary-title separator">Contract</span>
</p>"""
soup = BeautifulSoup(html, "html.parser")
print(
soup.find("p", class_="pv-entity__secondary-title t-14 t-black t-normal")
.find_next(text=True)
.strip()
)

或者：您可以使用.contents:

print(
soup.find("p", class_="pv-entity__secondary-title t-14 t-black t-normal")
.contents[0]
.strip()
)

输出(在两种解决方案中(：

Abeokuta, Ogun State

相关内容

最新更新

热门标签：