使用Python和JSON对雅虎财经统计数据进行网络抓取



我从以下位置找到了这个很棒的工作代码片段: 使用BS4对雅虎财经统计数据进行网络抓取

这是我所指的代码:

import requests, re, json, pprint
p = re.compile(r'root.App.main = (.*);')
tickers = ['NKE','AAPL','SPG']
results = {}
with requests.Session() as s:
for ticker in tickers:
r = s.get('https://finance.yahoo.com/quote/{}/key-statistics?p={}'.format(ticker,ticker))
data = json.loads(p.findall(r.text)[0])
key_stats = data['context']['dispatcher']['stores']['QuoteSummaryStore']
res = {
'Enterprise Value' : key_stats['defaultKeyStatistics']['enterpriseValue']['fmt']
,'Trailing P/E' : key_stats['summaryDetail']['trailingPE']['fmt']
,'Forward P/E' : key_stats['summaryDetail']['forwardPE']['fmt']
,'PEG Ratio (5 yr expected)' : key_stats['defaultKeyStatistics']['pegRatio']['fmt']
, 'Return on Assets' : key_stats['financialData']['returnOnAssets']['fmt']
, 'Quarterly Revenue Growth' : key_stats['financialData']['revenueGrowth']['fmt']
, 'EBITDA' : key_stats['financialData']['ebitda']['fmt']
, 'Diluted EPS' : key_stats['defaultKeyStatistics']['trailingEps']['fmt']
, 'Total Debt/Equity' : key_stats['financialData']['debtToEquity']['fmt']
, 'Current Ratio' :  key_stats['financialData']['currentRatio']['fmt']
}
results[ticker] = res
pprint.pprint(results)

我已经测试了这段代码,它可以工作。但是,我是 JSON 的新手,虽然我了解这段代码如何工作的高级内容,但我不确定这段代码的几个部分的机制。

如果能对这些部分的工作原理进行一些评论/解释,我将不胜感激:

第一节: 问题 1:此正则表达式在网页上如何工作?我以前从未见过看起来像这样的正则表达式。

p = re.compile(r'root.App.main = (.*);')

第二节: 问题2:我没有意识到页面上的关键统计信息被分解为上下文,调度程序,商店和QuoteSummaryStore。此代码块如何工作,新手可以在哪里查找有关它的更多信息?

key_stats = data['context']['dispatcher']['stores']['QuoteSummaryStore']

第三节: 问题 3:人们如何确定企业价值由 key_stats['defaultKeyStatistics']['enterpriseValue']['fmt'] 组成?

res = {
'Enterprise Value' : key_stats['defaultKeyStatistics']['enterpriseValue']['fmt']
,'Trailing P/E' : key_stats['summaryDetail']['trailingPE']['fmt']
,'Forward P/E' : key_stats['summaryDetail']['forwardPE']['fmt']
,'PEG Ratio (5 yr expected)' : key_stats['defaultKeyStatistics']['pegRatio']['fmt']
, 'Return on Assets' : key_stats['financialData']['returnOnAssets']['fmt']
, 'Quarterly Revenue Growth' : key_stats['financialData']['revenueGrowth']['fmt']
, 'EBITDA' : key_stats['financialData']['ebitda']['fmt']
, 'Diluted EPS' : key_stats['defaultKeyStatistics']['trailingEps']['fmt']
, 'Total Debt/Equity' : key_stats['financialData']['debtToEquity']['fmt']
, 'Current Ratio' :  key_stats['financialData']['currentRatio']['fmt']

提前谢谢。

在检查了此代码中key_stats的输出后,我对代码如何吐出数据有了更好的理解。 我已经整理了我感兴趣的数据。 希望这将在未来对其他人有所帮助。

这基本上回答了问题 2 和 3。

不过,我仍然对问题 1 感到困惑。

以下是输出中的大部分有用数据:

res = {
'Enterprise Value' : key_stats['defaultKeyStatistics']['enterpriseValue']['fmt']
,'Enterprise Value over Revenue' : key_stats['defaultKeyStatistics']['enterpriseToRevenue']['fmt']
,'Profit Margin' : key_stats['defaultKeyStatistics']['profitMargins']['fmt']
,'Enterprise Value over EBITDA' : key_stats['defaultKeyStatistics']['enterpriseToEbitda']['fmt']
,'Forward EPS' : key_stats['defaultKeyStatistics']['forwardEps']['fmt']
,'Trailing EPS' : key_stats['defaultKeyStatistics']['trailingEps']['fmt']
,'Shares Outstanding' : key_stats['defaultKeyStatistics']['sharesOutstanding']['fmt']
,'Book Value' : key_stats['defaultKeyStatistics']['bookValue']['fmt']
,'Shares Short' : key_stats['defaultKeyStatistics']['sharesShort']['fmt']
,'Shares Short Pct Out' : key_stats['defaultKeyStatistics']['sharesPercentSharesOut']['fmt']
,'Held Pct by Institutions' : key_stats['defaultKeyStatistics']['heldPercentInstitutions']['fmt']
,'Held Pct by Insiders' : key_stats['defaultKeyStatistics']['heldPercentInsiders']['fmt']
,'Net Income to Common Stock' : key_stats['defaultKeyStatistics']['netIncomeToCommon']['fmt']
,'Short Ratio' : key_stats['defaultKeyStatistics']['shortRatio']['fmt']
,'Float' : key_stats['defaultKeyStatistics']['floatShares']['fmt']
,'Price to Sales Trl 12 Mths' : key_stats['defaultKeyStatistics']['priceToSalesTrailing12Months']['fmt']
,'PEG Ratio (5 yr expected)' : key_stats['defaultKeyStatistics']['pegRatio']['fmt']
,'YTD Return' : key_stats['defaultKeyStatistics']['ytdReturn']['fmt']
, 'Diluted EPS' : key_stats['defaultKeyStatistics']['trailingEps']['fmt']
,'Trailing P/E' : key_stats['summaryDetail']['trailingPE']['fmt']
,'Forward P/E' : key_stats['summaryDetail']['forwardPE']['fmt']
,'Open' : key_stats['summaryDetail']['regularMarketOpen']['fmt']
,'High' : key_stats['summaryDetail']['regularMarketDayHigh']['fmt']
,'Low' : key_stats['summaryDetail']['regularMarketDayLow']['fmt']
,'Close' : key_stats['summaryDetail']['regularMarketPrice']['fmt']
,'Previous Close' : key_stats['summaryDetail']['regularMarketPreviousClose']['fmt']
,'Avg 10 Day Volume' : key_stats['summaryDetail']['averageDailyVolume10Day']['fmt']
,'Avg 3 Mth Volume' : key_stats['summaryDetail']['averageDailyVolume3Month']['fmt']
,'Volume' : key_stats['summaryDetail']['regularMarketVolume']['fmt']
,'Market Capitalisation' : key_stats['summaryDetail']['marketCap']['longFmt']
,'Dividend Rate' : key_stats['summaryDetail']['dividendRate']['longFmt']
,'Trailing Ann Div Yld' : key_stats['summaryDetail']['trailingAnnualDividendYield']['longFmt']
,'Trailing Ann Div Rate' : key_stats['summaryDetail']['trailingAnnualDividendRate']['longFmt']
,'Payout Ratio' : key_stats['summaryDetail']['payoutRatio']['longFmt']
,'Total Assets' : key_stats['summaryDetail']['TotalAssets']['longFmt']
,'Price To Sales Trl 12 Mths' : key_stats['summaryDetail']['priceToSalesTrailing12Months']['longFmt']
,'Five Yr Avg Div Yld' : key_stats['summaryDetail']['fiveYearAvgDividendYield']['longFmt']
,'Dividend Yield' : key_stats['summaryDetail']['dividendYield']['longFmt']
, 'EBITDA Margins' : key_stats['financialData']['ebitdaMargins']['fmt']
, 'Profit Margins' : key_stats['financialData']['profitMargins']['fmt']
, 'Gross Margins' : key_stats['financialData']['grossMargins']['fmt']
, 'Operating Cash Flow' : key_stats['financialData']['operatingCashflow']['fmt']
, 'Revenue Growth' : key_stats['financialData']['revenueGrowth']['fmt']
, 'Operating Margins' : key_stats['financialData']['operatingMargins']['fmt']
, 'EBITDA' : key_stats['financialData']['ebitda']['fmt']
, 'Target Low Price' : key_stats['financialData']['targetLowPrice']['fmt']
, 'Gross Profits' : key_stats['financialData']['grossProfits']['fmt']
, 'Free Cash Flow' : key_stats['financialData']['freeCashflow']['fmt']
, 'Target Median Price' : key_stats['financialData']['targetMedianPrice']['fmt']
, 'Earnings Growth' : key_stats['financialData']['earningsGrowth']['fmt']
, 'Current Ratio' :  key_stats['financialData']['currentRatio']['fmt']
, 'Return on Assets' : key_stats['financialData']['returnOnAssets']['fmt']
, 'Target Mean Price' : key_stats['financialData']['targetMeanPrice']['fmt']
, 'Total Debt/Equity' : key_stats['financialData']['debtToEquity']['fmt']
, 'Return On Equity' : key_stats['financialData']['returnOnEquity']['fmt']
, 'Target High Price' : key_stats['financialData']['targetHighPrice']['fmt']
, 'Total Cash' : key_stats['financialData']['totalCash']['fmt']
, 'Total Debt' : key_stats['financialData']['totalDebt']['fmt']
, 'Total Revenue' : key_stats['financialData']['totalRevenue']['fmt']
, 'Total Cash Per Share' : key_stats['financialData']['totalCashPerShare']['fmt']
, 'Revenue Per Share' : key_stats['financialData']['revenuePerShare']['fmt']
, 'Quick Ratio' : key_stats['financialData']['quickRatio']['fmt']
, 'Quarterly Revenue Growth' : key_stats['financialData']['revenueGrowth']['fmt']

最新更新