BeautifulSoup在任何汤命令上返回"NoneType"



我正在使用BeautifulSoup抓取WSJ,但它似乎永远找不到id为"的元素;热门新闻";,其总是在主页上可用。我尝试过find((、find_all((和其他各种方法,它们都为在我的results对象上调用的任何方法返回一个NoneType

我正在尝试提取关于热门新闻文章的元数据,主要是文章标题和url。每一篇文章的元数据都在一个名为"的类下;WSJ主题--标题--7VCzo7Ay";,但我只想要那些位于";热门新闻";分区

这是我的代码:

import requests
from bs4 import BeautifulSoup
from shutil import copyfile
URL = 'https://www.wsj.com'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='top-news')
topArticles = results.find_all('div', class_='WSJTheme--headline--7VCzo7Ay ')

指定User-Agent以从服务器获得正确的响应:

import requests
from bs4 import BeautifulSoup

url = "https://www.wsj.com/"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
for headline in soup.select('#top-news span[class*="headline"]'):
print(headline.text)

打印:

Oil Giants Dealt Defeats as Climate Pressures Intensify
At Least Eight Killed in San Jose Shooting
HSBC to Exit Most U.S. Retail Banking
Amazon-MGM Deal Marks Win for Hedge Funds
Cities Reverse Defunding the Police Amid Rising Crime
Federal Prosecutors Have Asked Banks for Information About Archegos Meltdown
Why a Grand Plan to Vaccinate the World Against Covid Unraveled
Inside the Israel-Hamas Conflict and One of Its Deadliest Hours in Gaza
Eric Carle, ‘The Very Hungry Caterpillar’ Author, Dies at 91
Wynn May Face U.S. Action for Role in China’s Push to Expel Businessman
Walmart to Sell New Line of Gap-Branded Homegoods

最新更新