我是网络抓取的新手,我试图从雅虎财经中提取价值。我使用pandas和match在表中搜索右行数据,代码如下:
#Get 5 year growth estimate------------------------------------------------
url_link = "https://finance.yahoo.com/quote/"+str(STOCK)+"/analysis?p="+str(STOCK)+""
r = requests.get(url_link,headers ={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'})
read_html_pandas_data = pd.read_html(r.text,match = STOCK)
table = read_html_pandas_data
print(table)
我在传递不同的股票字符串,如'ABC'。我得到一个长度为1的列表:
[ Growth Estimates ABC Industry Sector(s) S&P 500
0 Current Qtr. 19.00% NaN NaN NaN
1 Next Qtr. 7.90% NaN NaN NaN
2 Current Year 18.40% NaN NaN NaN
3 Next Year 6.00% NaN NaN NaN
4 Next 5 Years (per annum) 10.69% NaN NaN NaN
5 Past 5 Years (per annum) 8.70% NaN NaN NaN]
我想要的值是10.69%,但我在如何正确提取它时遇到了麻烦。我之前使用了不同的方法,但是表的顺序根据股票URL而变化,所以我想尝试更一致。
我建议将索引设置为可访问的,然后使用它访问它。例如,
table.set_index('Growth Estimates', inplace=True)
table.loc['Next 5 Years (per annum)']
请注意,现在您有一个数据帧列表,因此您可能想要这样做:
table = read_html_pandas_data[0]
试试这个:
# Make it clear what table you are interested in
table = pd.read_html(r.text, match="Growth Estimates")[0]
# Get the value you want
table.loc[table["Growth Estimates"] == "Next 5 Years (per annum)", STOCK]