BeautifulSoup-使用过滤文本提取特定元素

我对网页抓取还比较陌生，我需要从中提取一个特定的元素，在这种情况下，'Research Project – Cooperative Agreements'就在数据列中的超链接之后。

我一直在超链接中使用以下代码搜索'Search_Type=Activity'：

for elem in soup(href=lambda href: href and "Search_Type=Activity" in href):
print (elem.parent)

因为我在爬一堆美国国立卫生研究院的拨款页面，我需要"；活动代码"；它们都出现在超链接的后面，其中包含术语"Search_Type=Scitivity"。

下面是我使用以下代码缩小范围的HTML内容：

<div class="col-md-8 datacolumn"> <a href="//grants.nih.gov/grants/funding/ac_search_results.htm?text_curr=u01&amp;Search.x=0&amp;Search.y=0&amp;Search_Type=Activity">U01</a> Research Project – Cooperative Agreements
<!--</div>
</div> end row -->
<!-- If it is not the first row we close the previous row div tags -->
</div>

仅供参考，这里使用的原始页面只是美国国立卫生研究院的拨款
有人能指出这个元素是什么以及如何从那里得到它吗？

尝试：

import requests
from bs4 import BeautifulSoup
url = "https://grants.nih.gov/grants/guide/rfa-files/RFA-DK-19-501.html"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
name = (
soup.select_one('[href*="Search_Type=Activity"]')
.find_next_sibling(text=True)
.strip()
)
print(name)

打印：

Research Project – Cooperative Agreements

相关内容

最新更新

热门标签：