这里是Python的新手。我有7项资产。我找到了过去两年的每日调整收盘价。但是,我需要一分钟一分钟的数据。这就是我目前所拥有的:
import pandas as pd
import yfinance as yf
import datetime as dt
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
def dl_data(i):
years = 2
end = dt.datetime.today()
start = end - dt.timedelta(365*years)
tickers = ["SBUX", "MCD", "CMG", "WEN", "DPZ", "YUM", "DENN"]
return(yf.download(tickers, start, end)['Adj Close'])
data3 = yf.download(tickers, period='7d', interval='1m')['Adj Close']
for i in range(1,504):
data3 = data3.append(dl_data(i))
Python在7天内只允许1米的间隔,否则会收到错误消息。因此,我决定使用循环将其附加到原始数据集。然而,当我在循环运行后编写data3.head((时,它最早会返回到2021年11月8日。我的理解是,对于范围(1504(中的i,在过去504天内执行循环,对吗?否则怎么办?
yfinance
不允许您下载超过30天的1分钟图表上的历史数据。并且只允许您在7天内下载它们。
以下是如果你尝试的话会得到的错误:
1 Failed download:
- EURUSD=X: 1m data not available for startTime=1654387200 and endTime=1654905600. The requested range must be within the last 30 days.
虽然由于yfinance
的限制,我无法完全回答您的问题,但我可以回答它,为您提供长达30天的100万数据。如果我被迫使用贷款,我就是这样做的。由于这些限制,我通常使用alpaca
或binance
之类的其他东西。
首先,制作一个包含日期范围的数据帧
import pandas as pd
TODAY = pd.to_datetime("today").date()
START = (TODAY - pd.DateOffset(days=29)).date()
# Reference: https://stackoverflow.com/a/48131963/16051077
d1 = pd.date_range(start=START, end=TODAY, freq="7D")
d2 = d1.shift(6, freq="d")
# fix end date (make sure latest end_date it doesn't go over end_date)
d2 = list(d2)[:-1] + [min(d2[-1], pd.Timestamp(TODAY))]
dates = pd.DataFrame(
data=dict(start_date=d1, end_date=d2), columns=("start_date", "end_date")
)
输出:
start_date end_date
0 2022-06-06 2022-06-12
1 2022-06-13 2022-06-19
2 2022-06-20 2022-06-26
3 2022-06-27 2022-07-03
4 2022-07-04 2022-07-05
在for循环中使用dates
DataFrame
import yfinance as yf
df_list = []
for i in dates.index:
start = dates.at[i, "start_date"]
end = dates.at[i, "end_date"]
tickers = ["TSLA", "MSFT", "AMZN"]
df = yf.download(tickers, start=start, end=end, interval="1m")["Adj Close"]
df_list.append(df)
history = pd.concat(df_list)
输出:
注意:由于市场假期,数据不包括所有日期
[*********************100%***********************] 3 of 3 completed
[*********************100%***********************] 3 of 3 completed
[*********************100%***********************] 3 of 3 completed
[*********************100%***********************] 3 of 3 completed
[*********************100%***********************] 3 of 3 completed
AMZN MSFT TSLA
2022-06-06 09:30:00-04:00 125.574501 273.179993 731.722900
2022-06-06 09:31:00-04:00 125.190002 273.500000 730.260010
2022-06-06 09:32:00-04:00 124.559998 273.190002 727.300110
2022-06-06 09:33:00-04:00 124.167503 273.519989 726.155029
2022-06-06 09:34:00-04:00 124.719902 273.220001 723.989990
... ... ... ...
2022-07-01 15:56:00-04:00 109.489998 259.029999 680.890015
2022-07-01 15:57:00-04:00 109.389999 259.079987 680.869995
2022-07-01 15:58:00-04:00 109.474998 259.369995 680.710022
2022-07-01 15:59:00-04:00 109.550003 259.539001 681.890015
2022-07-01 16:00:00-04:00 109.559998 259.579987 681.789978
7409 rows × 3 columns
完整代码:
import pandas as pd
import yfinance as yf
TODAY = pd.to_datetime("today").date()
START = (TODAY - pd.DateOffset(days=29)).date()
# Reference: https://stackoverflow.com/a/48131963/16051077
d1 = pd.date_range(start=START, end=TODAY, freq="7D")
d2 = d1.shift(6, freq="d")
# fix end date (make sure latest end_date it doesn't go over end_date)
d2 = list(d2)[:-1] + [min(d2[-1], pd.Timestamp(TODAY))]
dates = pd.DataFrame(
data=dict(start_date=d1, end_date=d2), columns=("start_date", "end_date")
)
df_list = []
for i in dates.index:
start = dates.at[i, "start_date"]
end = dates.at[i, "end_date"]
tickers = ["TSLA", "MSFT", "AMZN"]
df = yf.download(tickers, start=start, end=end, interval="1m")["Adj Close"]
df_list.append(df)
history = pd.concat(df_list)