我有OHLC数据系列,我想移动到'一'列系列来排序OHLC。
数据:
Time Open High Low Close Volume
0 2022-04-30 13:05:00 38580.46 38610.44 38580.46 38610.44 33.14334
1 2022-04-30 13:06:00 38610.43 38610.44 38600.00 38600.00 10.64336
2 2022-04-30 13:07:00 38600.00 38604.17 38600.00 38604.16 11.41531
3 2022-04-30 13:08:00 38604.16 38607.97 38604.16 38604.16 9.25056
4 2022-04-30 13:09:00 38604.16 38606.81 38604.16 38606.80 6.83944
5 2022-04-30 13:10:00 38606.81 38606.81 38602.00 38602.00 8.55118
6 2022-04-30 13:11:00 38602.01 38602.01 38602.00 38602.01 6.45035
7 2022-04-30 13:12:00 38602.01 38602.01 38600.28 38600.28 5.33661
8 2022-04-30 13:13:00 38600.28 38606.80 38600.28 38606.80 11.11430
9 2022-04-30 13:14:00 38606.80 38606.80 38600.56 38600.57 9.40830
我用for循环,但我认为这在大数据上很慢。
df1 = pd.DataFrame(ohlcv, columns = ['Time', 'Open', 'High', 'Low', 'Close', 'Volume'])
df_tick=pd.DataFrame(columns=['Time', 'Price', 'Volume'])
for i in range(0,len(df1)):
df_tick = pd.concat([df_tick, pd.DataFrame.from_records([{'Time': df1['Time'][i],'Price': df1['Open'][i]}])],ignore_index=True)
df_tick = pd.concat([df_tick, pd.DataFrame.from_records([{'Time': df1['Time'][i],'Price': df1['High'][i]}])],ignore_index=True)
df_tick = pd.concat([df_tick, pd.DataFrame.from_records([{'Time': df1['Time'][i],'Price': df1['Low'][i]}])],ignore_index=True)
df_tick = pd.concat([df_tick, pd.DataFrame.from_records([{'Time': df1['Time'][i],'Price': df1['Close'][i]}])], ignore_index=True)
目标数据如下所示:
Time Price Volume
0 2022-04-30 13:05:00 38580.46 NaN
1 2022-04-30 13:05:00 38610.44 NaN
2 2022-04-30 13:05:00 38580.46 NaN
3 2022-04-30 13:05:00 38610.44 NaN
4 2022-04-30 13:06:00 38610.43 NaN
5 2022-04-30 13:06:00 38610.44 NaN
6 2022-04-30 13:06:00 38600.0 NaN
7 2022-04-30 13:06:00 38600.0 NaN
8 2022-04-30 13:07:00 38600.0 NaN
9 2022-04-30 13:07:00 38604.17 NaN
10 2022-04-30 13:07:00 38600.0 NaN
11 2022-04-30 13:07:00 38604.16 NaN
12 2022-04-30 13:08:00 38604.16 NaN
13 2022-04-30 13:08:00 38607.97 NaN
14 2022-04-30 13:08:00 38604.16 NaN
15 2022-04-30 13:08:00 38604.16 NaN
16 2022-04-30 13:09:00 38604.16 NaN
17 2022-04-30 13:09:00 38606.81 NaN
18 2022-04-30 13:09:00 38604.16 NaN
我尝试了简单的concat,但排序与索引,过滤值也。
df_tick=pd.concat([df1['Open'], df1['High'],df1['Low'],df1['Close']], axis=0)
df_tick=df_tick.sort_index()
如何仅按条目顺序排序索引?
我认为没有必要使用for循环,你可以使用stack
来简化你的代码,加上stack
也保留了OHLC
价格的顺序,所以没有必要在之后排序:
df.set_index(['Time', 'Volume']).stack().droplevel(-1).reset_index(name='Price')
结果
Time Volume Price
0 2022-04-30 13:05:00 33.14334 38580.46
1 2022-04-30 13:05:00 33.14334 38610.44
2 2022-04-30 13:05:00 33.14334 38580.46
3 2022-04-30 13:05:00 33.14334 38610.44
4 2022-04-30 13:06:00 10.64336 38610.43
5 2022-04-30 13:06:00 10.64336 38610.44
6 2022-04-30 13:06:00 10.64336 38600.00
7 2022-04-30 13:06:00 10.64336 38600.00
8 2022-04-30 13:07:00 11.41531 38600.00
9 2022-04-30 13:07:00 11.41531 38604.17
10 2022-04-30 13:07:00 11.41531 38600.00
11 2022-04-30 13:07:00 11.41531 38604.16
12 2022-04-30 13:08:00 9.25056 38604.16
13 2022-04-30 13:08:00 9.25056 38607.97
14 2022-04-30 13:08:00 9.25056 38604.16
15 2022-04-30 13:08:00 9.25056 38604.16
16 2022-04-30 13:09:00 6.83944 38604.16
17 2022-04-30 13:09:00 6.83944 38606.81
18 2022-04-30 13:09:00 6.83944 38604.16
...
38 2022-04-30 13:14:00 9.40830 38600.56
39 2022-04-30 13:14:00 9.40830 38600.57
从@shubhamsharma的解释中得到提示,您不需要转换为分类dtype并保持顺序的一个选项是withpyjanitor
# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(index = ['Time', 'Volume'],
values_to = 'Price',
sort_by_appearance=True)
.drop(columns='variable')
)
Time Volume Price
0 2022-04-30 13:05:00 33.14334 38580.46
1 2022-04-30 13:05:00 33.14334 38610.44
2 2022-04-30 13:05:00 33.14334 38580.46
3 2022-04-30 13:05:00 33.14334 38610.44
4 2022-04-30 13:06:00 10.64336 38610.43
5 2022-04-30 13:06:00 10.64336 38610.44
6 2022-04-30 13:06:00 10.64336 38600.00
7 2022-04-30 13:06:00 10.64336 38600.00
8 2022-04-30 13:07:00 11.41531 38600.00
9 2022-04-30 13:07:00 11.41531 38604.17
10 2022-04-30 13:07:00 11.41531 38600.00
11 2022-04-30 13:07:00 11.41531 38604.16
12 2022-04-30 13:08:00 9.25056 38604.16
13 2022-04-30 13:08:00 9.25056 38607.97
14 2022-04-30 13:08:00 9.25056 38604.16
15 2022-04-30 13:08:00 9.25056 38604.16
16 2022-04-30 13:09:00 6.83944 38604.16
17 2022-04-30 13:09:00 6.83944 38606.81
18 2022-04-30 13:09:00 6.83944 38604.16
19 2022-04-30 13:09:00 6.83944 38606.80
20 2022-04-30 13:10:00 8.55118 38606.81
21 2022-04-30 13:10:00 8.55118 38606.81
22 2022-04-30 13:10:00 8.55118 38602.00
23 2022-04-30 13:10:00 8.55118 38602.00
24 2022-04-30 13:11:00 6.45035 38602.01
25 2022-04-30 13:11:00 6.45035 38602.01
26 2022-04-30 13:11:00 6.45035 38602.00
27 2022-04-30 13:11:00 6.45035 38602.01
28 2022-04-30 13:12:00 5.33661 38602.01
29 2022-04-30 13:12:00 5.33661 38602.01
30 2022-04-30 13:12:00 5.33661 38600.28
31 2022-04-30 13:12:00 5.33661 38600.28
32 2022-04-30 13:13:00 11.11430 38600.28
33 2022-04-30 13:13:00 11.11430 38606.80
34 2022-04-30 13:13:00 11.11430 38600.28
35 2022-04-30 13:13:00 11.11430 38606.80
36 2022-04-30 13:14:00 9.40830 38606.80
37 2022-04-30 13:14:00 9.40830 38606.80
38 2022-04-30 13:14:00 9.40830 38600.56
39 2022-04-30 13:14:00 9.40830 38600.57