从url链接抓取值有问题



我是网络抓取的新手,我试图从雅虎财经中提取价值。我使用pandas和match在表中搜索右行数据,代码如下:

#Get 5 year growth estimate------------------------------------------------
url_link = "https://finance.yahoo.com/quote/"+str(STOCK)+"/analysis?p="+str(STOCK)+""
r = requests.get(url_link,headers ={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'})
read_html_pandas_data = pd.read_html(r.text,match = STOCK)
table = read_html_pandas_data
print(table)

我在传递不同的股票字符串,如'ABC'。我得到一个长度为1的列表:

[           Growth Estimates     ABC  Industry  Sector(s)  S&P 500
0              Current Qtr.  19.00%       NaN        NaN      NaN
1                 Next Qtr.   7.90%       NaN        NaN      NaN
2              Current Year  18.40%       NaN        NaN      NaN
3                 Next Year   6.00%       NaN        NaN      NaN
4  Next 5 Years (per annum)  10.69%       NaN        NaN      NaN
5  Past 5 Years (per annum)   8.70%       NaN        NaN      NaN]

我想要的值是10.69%,但我在如何正确提取它时遇到了麻烦。我之前使用了不同的方法,但是表的顺序根据股票URL而变化,所以我想尝试更一致。

我建议将索引设置为可访问的,然后使用它访问它。例如,

table.set_index('Growth Estimates', inplace=True)
table.loc['Next 5 Years (per annum)']

请注意,现在您有一个数据帧列表,因此您可能想要这样做:

table = read_html_pandas_data[0]

试试这个:

# Make it clear what table you are interested in
table = pd.read_html(r.text, match="Growth Estimates")[0]
# Get the value you want
table.loc[table["Growth Estimates"] == "Next 5 Years (per annum)", STOCK]

最新更新