对于我的计算机项目,我正在尝试制作一个财务预测网站。代码中的一个元素是web抓取API。它从雅虎财经(Yahoo Finance)上一家公司的损益表中抓取数据。
然而,即使URL是正确的,我仍然得到一个404错误。
我的代码
import pandas as pd
import urllib.request as ur
from bs4 import BeautifulSoup
import warnings
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
income_url = 'http://uk.finance.yahoo.com/quote/AAPL/financials?p=AAPL'
read_url = ur.urlopen(income_url).read()
income_soup = BeautifulSoup(read_url, 'lxml')
div_list = []
for div in income_soup.find_all('div'):
div_list.append(div.string)
if not div.string == div.get('title'):
div_list.append(div.get('title'))
div_list = [incl for incl in div_list if incl not in
('Operating Expenses', 'Non-recurring Events', 'Expand All')]
div_list = list(filter(None, div_list))
div_list = [incl for incl in div_list if not incl.startswith('(function')]
income_list = div_list[13: -5]
income_list.insert(0, 'Breakdown')
income_data = list(zip(*[iter(income_list)]*6))
income_df = pd.DataFrame(income_data)
headers = income_df.iloc[0]
income_df = income_df[1:]
income_df.columns = headers
income_df.set_index('Breakdown', inplace=True, drop=True)
warnings.warn('Amounts are in thousands.')
print(income_df)
我一直得到这个错误:
urllib.error。HTTP错误404:Not Found Error
如何修复?
这个问题似乎可以通过确保传递一个User-Agent报头来解决。
使用请求模块:
import requests
agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Safari/605.1.15'
headers = {'User-Agent': agent}
url = 'http://uk.finance.yahoo.com/quote/AAPL/financials?p=AAPL'
response = requests.get(url, headers=headers)
response.raise_for_status()