urllib.error.HTTPError: HTTPError 404: Not Found(雅虎财经)



对于我的计算机项目,我正在尝试制作一个财务预测网站。代码中的一个元素是web抓取API。它从雅虎财经(Yahoo Finance)上一家公司的损益表中抓取数据。

然而,即使URL是正确的,我仍然得到一个404错误。

我的代码

import pandas as pd
import urllib.request as ur
from bs4 import BeautifulSoup
import warnings
import ssl

ssl._create_default_https_context = ssl._create_unverified_context
income_url = 'http://uk.finance.yahoo.com/quote/AAPL/financials?p=AAPL'
read_url = ur.urlopen(income_url).read()
income_soup = BeautifulSoup(read_url, 'lxml')
div_list = []
for div in income_soup.find_all('div'):
    div_list.append(div.string)
    if not div.string == div.get('title'):
        div_list.append(div.get('title'))
div_list = [incl for incl in div_list if incl not in
            ('Operating Expenses', 'Non-recurring Events', 'Expand All')]
div_list = list(filter(None, div_list))
div_list = [incl for incl in div_list if not incl.startswith('(function')]
income_list = div_list[13: -5]
income_list.insert(0, 'Breakdown')
income_data = list(zip(*[iter(income_list)]*6))
income_df = pd.DataFrame(income_data)
headers = income_df.iloc[0]
income_df = income_df[1:]
income_df.columns = headers
income_df.set_index('Breakdown', inplace=True, drop=True)
warnings.warn('Amounts are in thousands.')
print(income_df)

我一直得到这个错误:

urllib.error。HTTP错误404:Not Found Error

如何修复?

这个问题似乎可以通过确保传递一个User-Agent报头来解决。

使用请求模块:

import requests
agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Safari/605.1.15'
headers = {'User-Agent': agent}
url = 'http://uk.finance.yahoo.com/quote/AAPL/financials?p=AAPL'
response = requests.get(url, headers=headers)
response.raise_for_status()

最新更新