Python BeautifulSoup class find return None



我正在用BeautifulSoup编写一个python程序。我想做一个网页刮板,将检索有关电子期刊的信息。我使用BeautifulSoup来检索html类,但它返回None或"[]"。我是一个初学者谁开始学习python 2周前,所以我不知道该做什么…请帮帮我。

这是我的代码。

from bs4 import BeautifulSoup
JAGS7_result = requests.get("https://agsjournals.onlinelibrary.wiley.com/toc/15325415/2021/69/7")
JAGS7_soup = BeautifulSoup(JAGS7_result.text, "html.parser")
results = JAGS7_soup.find_all("div",{"class": "issue-item"})
print(results)```

您的http响应不成功。收到403 not allowed response.

,

print(JAGS7_result.status_code)

应该是200。你的情况是403。

使用请求头来解决这个问题。

h = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36'}
JAGS7_result = requests.get("https://agsjournals.onlinelibrary.wiley.com/toc/15325415/2021/69/7", headers=h)

现在你得到了你想要的结果。

尝试在请求期间设置User-Agent标头:

import requests
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0"
}
JAGS7_result = requests.get(
"https://agsjournals.onlinelibrary.wiley.com/toc/15325415/2021/69/7",
headers=headers,
)
JAGS7_soup = BeautifulSoup(JAGS7_result.text, "html.parser")
for title in JAGS7_soup.select("a > h2"):
print(title.text)

打印:

Cover
Issue Information
A glimmer of hope for the most vulnerable
Emergency department visits for emergent conditions among older adults during the COVID-19 pandemic
SARS-CoV-2 antibody detection in skilled nursing facility residents
VA home-based primary care interdisciplinary team structure varies with Veterans' needs, aligns with PACE regulation
Emergency visits by older adults decreased during COVID-19 but increased in the oldest old
Teaching geriatrics during the COVID-19 pandemic: Aquifer Geriatrics to the rescue
Changes in medication use among long-stay residents with dementia in Michigan during the pandemic
Reduction in respiratory viral infections among hospitalized older adults during the COVID-19 pandemic
...

我还建议您在将来的工作中使用spider和scrapy。它是一个很好的抓取包,因为beautifulsoup经常不能成功地通过JavaScript网站。

最新更新