我有一个包含不同符号的分钟条的df,比如:
timestamp open high low close volume trade_count vwap symbol
0 2021-10-13 08:00:00+00:00 140.20 140.40 140.000 140.40 6084 65 140.205417 AAPL
1 2021-10-13 08:01:00+00:00 140.35 140.40 140.200 140.40 3052 58 140.308182 AAPL
2 2021-10-13 08:02:00+00:00 140.35 140.35 140.350 140.35 632 30 140.320934 AAPL
3 2021-10-13 08:03:00+00:00 140.28 140.30 140.200 140.20 2867 36 140.279473 AAPL
4 2021-10-13 08:04:00+00:00 140.20 140.20 140.200 140.20 435 36 140.199195 AAPL
... ... ... ... ... ... ... ... ... ...
58250 2021-10-27 19:58:00+00:00 209.31 209.33 209.215 209.26 26440 348 209.251852 ZTS
58251 2021-10-27 19:59:00+00:00 209.28 209.59 209.010 209.56 109758 1060 209.384672 ZTS
58252 2021-10-27 20:03:00+00:00 209.58 209.58 209.580 209.58 537786 49 209.580000 ZTS
58253 2021-10-27 20:05:00+00:00 209.58 209.58 209.580 209.58 4170 1 209.580000 ZTS
58254 2021-10-27 20:12:00+00:00 209.58 209.58 209.580 209.58 144 1 209.580000 ZTS
[58255 rows x 9 columns]
我希望能够使用df.groupby
,这样我就可以在每个股票行情的每一天循环。类似于:
timestamp open high low close volume trade_count vwap symbol
0 2021-10-13 08:00:00+00:00 140.20 140.40 140.000 140.40 6084 65 140.205417 AAPL
1 2021-10-13 08:01:00+00:00 140.35 140.40 140.200 140.40 3052 58 140.308182 AAPL
2 2021-10-13 08:02:00+00:00 140.35 140.35 140.350 140.35 632 30 140.320934 AAPL
3 2021-10-13 08:03:00+00:00 140.28 140.30 140.200 140.20 2867 36 140.279473 AAPL
4 2021-10-13 08:04:00+00:00 140.20 140.20 140.200 140.20 435 36 140.199195 AAPL
timestamp open high low close volume trade_count vwap symbol
0 2021-10-14 08:00:00+00:00 140.20 140.40 140.000 140.40 6084 65 140.205417 AAPL
1 2021-10-14 08:01:00+00:00 140.35 140.40 140.200 140.40 3052 58 140.308182 AAPL
2 2021-10-14 08:02:00+00:00 140.35 140.35 140.350 140.35 632 30 140.320934 AAPL
3 2021-10-14 08:03:00+00:00 140.28 140.30 140.200 140.20 2867 36 140.279473 AAPL
4 2021-10-14 08:04:00+00:00 140.20 140.20 140.200 140.20 435 36 140.199195 AAPL
我该怎么做?
有人建议我看看另一个问题:
table = df.groupby(pd.Grouper(key='timestamp', axis=0, freq='D')).sum()
但这需要分钟数据并每天返回:
Name: 2022-04-04 00:00:00+00:00, dtype: float64)
(Timestamp('2022-04-05 00:00:00+0000', tz='UTC', freq='D'), open 0.0
high 0.0
low 0.0
close 0.0
volume 0.0
trade_count 0.0
vwap 0.0
Name: 2022-04-05 00:00:00+00:00, dtype: float64)
(Timestamp('2022-04-06 00:00:00+0000', tz='UTC', freq='D'), open 2000.818300
high 2001.724000
low 2000.563300
close 2001.462900
volume 59717.000000
trade_count 487.000000
vwap 2001.073115
Name: 2022-04-06 00:00:00+00:00, dtype: float64)
我需要获取我的分钟数据,并将这些分钟划分为不同的日子。我不需要像这里建议的那样高档到日常酒吧
https://pandas.pydata.org/docs/user_guide/basics.html#by-值
我认为这个决定取决于在天数加上符号上循环的实际目标。
df = df.sort_values(by=["timestamp", "symbol"])
您正在寻找:
# ensure datetime
df['timestamp'] = pd.to_datetime(df['timestamp'])
# group by symbol and date
out = df.groupby(['symbol', df['timestamp'].dt.date]).sum()
感谢Raibek和Mozway的帮助!正是莫兹韦的回答让我走到了那里:
df['timestamp'] = pd.to_datetime(df['timestamp'])
table = df.groupby(['symbol', df['timestamp'].dt.date]) # remove the sum
for ohlc in table:
print(ohlc)
它使每一天都有自己的df:
(('QCOM', datetime.date(2021, 11, 22)), timestamp open high low close volume trade_count vwap symbol
44960 2021-11-22 09:15:00+00:00 185.11 185.11 185.11 185.11 331 9 185.075559 QCOM
44961 2021-11-22 09:22:00+00:00 185.15 185.15 185.15 185.15 180 6 185.136722 QCOM
44962 2021-11-22 09:27:00+00:00 185.54 185.73 185.54 185.73 631 12 185.669271 QCOM
44963 2021-11-22 09:29:00+00:00 185.65 185.65 185.65 185.65 302 4 185.663510 QCOM
44964 2021-11-22 09:49:00+00:00 185.63 185.63 185.63 185.63 179 6 185.632626 QCOM
... ... ... ... ... ... ... ... ... ...
45499 2021-11-22 23:37:00+00:00 181.75 181.75 181.75 181.75 309 8 181.653786 QCOM
45500 2021-11-22 23:45:00+00:00 181.64 181.64 181.64 181.64 193 2 181.640000 QCOM
45501 2021-11-22 23:52:00+00:00 181.64 181.64 181.64 181.64 187 3 181.644064 QCOM
45502 2021-11-22 23:54:00+00:00 181.53 181.53 181.50 181.50 2000 12 181.511655 QCOM
45503 2021-11-22 23:55:00+00:00 181.50 181.50 181.50 181.50 5100 3 181.500000 QCOM
[544 rows x 9 columns])
(('QCOM', datetime.date(2021, 11, 23)), timestamp open high low close volume trade_count vwap symbol
45504 2021-11-23 00:09:00+00:00 181.50 181.5000 181.50 181.5000 1250 5 181.502240 QCOM
45505 2021-11-23 00:10:00+00:00 181.50 181.5000 181.50 181.5000 1612 5 181.500000 QCOM
45506 2021-11-23 00:15:00+00:00 181.55 181.5500 181.55 181.5500 640 3 181.551563 QCOM
45507 2021-11-23 00:19:00+00:00 181.50 181.5000 181.50 181.5000 253 4 181.500000 QCOM
45508 2021-11-23 00:27:00+00:00 181.55 181.5500 181.55 181.5500 320 3 181.547500 QCOM
45509 2021-11-23 00:31:00+00:00 181.55 181.5500 181.55 181.5500 100 1 181.550000 QCOM
45510 2021-11-23 00:32:00+00:00 181.50 181.5000 181.50 181.5000 800 3 181.500000 QCOM
45511 2021-11-23 00:33:00+00:00 181.40 181.4000 181.40 181.4000 1361 10 181.400786 QCOM
45512 2021-11-23 00:34:00+00:00 181.33 181.3300 181.00 181.0000 5446 47 181.179884 QCOM
45513 2021-11-23 00:35:00+00:00 181.00 181.0000 180.76 180.7600 2674 21 180.889443 QCOM
45514 2021-11-23 00:36:00+00:00 180.77 180.9900 180.77 180.9900 241 4 180.860871 QCOM
45515 2021-11-23 00:38:00+00:00 180.77 181.1899 180.77 181.1899 300 5 180.979033 QCOM
45516 2021-11-23 00:39:00+00:00 180.75 180.7500 180.75 180.7500 238 3 180.751597 QCOM)