下载雅虎财务数据比只使用for循环要快得多



我有以下代码用于从雅虎金融下载基本数据:

from yahoo_fin import stock_info as si
import yfinance as yf
import pandas as pd
tickers = ('AAPL', 'MSFT', 'IBM')
infos = []
for i in tickers:
infos.append(yf.Ticker(i).info)

这实际上非常好,除了如果我使用1000个ticker而不是像上面的例子中那样只使用3个ticker,for循环需要大量的时间。我的问题是,是否有一种方法可以更快地下载信息数据。

非常感谢您的回答。提前谢谢!

您也可以尝试yahoo_fin。我发现它更快。如果股票行情器的数量很大,那么多处理可能值得一看。我推荐Ray而不是python多处理库。以下是使用每种方法的10个与100个股票行情器的时间:

import pandas as pd, timeit
from yahoo_fin import stock_info as si
import yfinance as yf
import ray
from concurrent.futures import ProcessPoolExecutor
def get_data(symbol):
return si.get_data(ticker=symbol)
def test(tickers):
print(f'***Testing with {len(tickers)} tickers:')
#yfinance
start_time = timeit.default_timer()
df_res = pd.DataFrame()
for i in tickers:
df_res = pd.concat([df_res, yf.Ticker(i).history(period="max")])
print(f'yfinance sync completed in: {timeit.default_timer() - start_time} seconds')  
#yahoo_fin
start_time = timeit.default_timer()
df_res = pd.DataFrame()
for s in tickers:
df_res = pd.concat([df_res, si.get_data(ticker=s)])
print(f'yahoo_fin sync completed in: {timeit.default_timer() - start_time} seconds')       
#multi-processing
executor = ProcessPoolExecutor() #pre-initialize pool
start_time = timeit.default_timer()
df_res = pd.DataFrame()
futures = executor.map(get_data, tickers)
df_res = pd.concat([r for r in futures])
print(f'concurrent.futures completed in: {timeit.default_timer() - start_time} seconds')       
executor.shutdown()
#ray multi processing
ray.init(ignore_reinit_error=True) #pre-intialize

start_time = timeit.default_timer()
@ray.remote
def get_data_ray(symbol):
return si.get_data(ticker=symbol)
result_ids = [get_data_ray.remote(s) for s in tickers]
df_res = pd.concat(ray.get(result_ids))
print(f'ray completed in: {timeit.default_timer() - start_time} seconds')       
if __name__ == '__main__':
#bb_persistence(symbol='SPY', st_dt='01/01/2000')
test(tickers = ['AAPL'] * 10)
test(tickers = ['AAPL'] * 100)

计时结果:

***Testing with 10 tickers:
yfinance sync completed in: 3.9864108999609016 seconds
yahoo_fin sync completed in: 3.1445734000299126 seconds
concurrent.futures completed in: 7.356426700018346 seconds
ray completed in: 1.891160300001502 seconds
***Testing with 100 tickers:
yfinance sync completed in: 37.78823659999762 seconds
yahoo_fin sync completed in: 33.37096950004343 seconds
concurrent.futures completed in: 13.296738199947868 seconds
ray completed in: 2.770267000014428 seconds

最新更新