将4列移动到一列,只排序索引,不排序值



我有OHLC数据系列,我想移动到'一'列系列来排序OHLC。

数据:

Time      Open      High       Low     Close    Volume
0 2022-04-30 13:05:00  38580.46  38610.44  38580.46  38610.44  33.14334
1 2022-04-30 13:06:00  38610.43  38610.44  38600.00  38600.00  10.64336
2 2022-04-30 13:07:00  38600.00  38604.17  38600.00  38604.16  11.41531
3 2022-04-30 13:08:00  38604.16  38607.97  38604.16  38604.16   9.25056
4 2022-04-30 13:09:00  38604.16  38606.81  38604.16  38606.80   6.83944
5 2022-04-30 13:10:00  38606.81  38606.81  38602.00  38602.00   8.55118
6 2022-04-30 13:11:00  38602.01  38602.01  38602.00  38602.01   6.45035
7 2022-04-30 13:12:00  38602.01  38602.01  38600.28  38600.28   5.33661
8 2022-04-30 13:13:00  38600.28  38606.80  38600.28  38606.80  11.11430
9 2022-04-30 13:14:00  38606.80  38606.80  38600.56  38600.57   9.40830

我用for循环,但我认为这在大数据上很慢。

df1 = pd.DataFrame(ohlcv, columns = ['Time', 'Open', 'High', 'Low', 'Close', 'Volume'])
df_tick=pd.DataFrame(columns=['Time', 'Price', 'Volume'])
for i in range(0,len(df1)):
df_tick = pd.concat([df_tick, pd.DataFrame.from_records([{'Time': df1['Time'][i],'Price': df1['Open'][i]}])],ignore_index=True)
df_tick = pd.concat([df_tick, pd.DataFrame.from_records([{'Time': df1['Time'][i],'Price': df1['High'][i]}])],ignore_index=True)
df_tick = pd.concat([df_tick, pd.DataFrame.from_records([{'Time': df1['Time'][i],'Price': df1['Low'][i]}])],ignore_index=True)
df_tick = pd.concat([df_tick, pd.DataFrame.from_records([{'Time': df1['Time'][i],'Price': df1['Close'][i]}])], ignore_index=True)

目标数据如下所示:

Time     Price Volume
0   2022-04-30 13:05:00  38580.46    NaN
1   2022-04-30 13:05:00  38610.44    NaN
2   2022-04-30 13:05:00  38580.46    NaN
3   2022-04-30 13:05:00  38610.44    NaN
4   2022-04-30 13:06:00  38610.43    NaN
5   2022-04-30 13:06:00  38610.44    NaN
6   2022-04-30 13:06:00   38600.0    NaN
7   2022-04-30 13:06:00   38600.0    NaN
8   2022-04-30 13:07:00   38600.0    NaN
9   2022-04-30 13:07:00  38604.17    NaN
10  2022-04-30 13:07:00   38600.0    NaN
11  2022-04-30 13:07:00  38604.16    NaN
12  2022-04-30 13:08:00  38604.16    NaN
13  2022-04-30 13:08:00  38607.97    NaN
14  2022-04-30 13:08:00  38604.16    NaN
15  2022-04-30 13:08:00  38604.16    NaN
16  2022-04-30 13:09:00  38604.16    NaN
17  2022-04-30 13:09:00  38606.81    NaN
18  2022-04-30 13:09:00  38604.16    NaN

我尝试了简单的concat,但排序与索引,过滤值也。

df_tick=pd.concat([df1['Open'], df1['High'],df1['Low'],df1['Close']], axis=0)
df_tick=df_tick.sort_index()

如何仅按条目顺序排序索引?

我认为没有必要使用for循环,你可以使用stack来简化你的代码,加上stack也保留了OHLC价格的顺序,所以没有必要在之后排序:

df.set_index(['Time', 'Volume']).stack().droplevel(-1).reset_index(name='Price')

结果

Time    Volume     Price
0   2022-04-30 13:05:00  33.14334  38580.46
1   2022-04-30 13:05:00  33.14334  38610.44
2   2022-04-30 13:05:00  33.14334  38580.46
3   2022-04-30 13:05:00  33.14334  38610.44
4   2022-04-30 13:06:00  10.64336  38610.43
5   2022-04-30 13:06:00  10.64336  38610.44
6   2022-04-30 13:06:00  10.64336  38600.00
7   2022-04-30 13:06:00  10.64336  38600.00
8   2022-04-30 13:07:00  11.41531  38600.00
9   2022-04-30 13:07:00  11.41531  38604.17
10  2022-04-30 13:07:00  11.41531  38600.00
11  2022-04-30 13:07:00  11.41531  38604.16
12  2022-04-30 13:08:00   9.25056  38604.16
13  2022-04-30 13:08:00   9.25056  38607.97
14  2022-04-30 13:08:00   9.25056  38604.16
15  2022-04-30 13:08:00   9.25056  38604.16
16  2022-04-30 13:09:00   6.83944  38604.16
17  2022-04-30 13:09:00   6.83944  38606.81
18  2022-04-30 13:09:00   6.83944  38604.16
...
38  2022-04-30 13:14:00   9.40830  38600.56
39  2022-04-30 13:14:00   9.40830  38600.57

从@shubhamsharma的解释中得到提示,您不需要转换为分类dtype并保持顺序的一个选项是withpyjanitor

# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(index = ['Time', 'Volume'],
values_to = 'Price',
sort_by_appearance=True)
.drop(columns='variable')
)
Time    Volume     Price
0   2022-04-30 13:05:00  33.14334  38580.46
1   2022-04-30 13:05:00  33.14334  38610.44
2   2022-04-30 13:05:00  33.14334  38580.46
3   2022-04-30 13:05:00  33.14334  38610.44
4   2022-04-30 13:06:00  10.64336  38610.43
5   2022-04-30 13:06:00  10.64336  38610.44
6   2022-04-30 13:06:00  10.64336  38600.00
7   2022-04-30 13:06:00  10.64336  38600.00
8   2022-04-30 13:07:00  11.41531  38600.00
9   2022-04-30 13:07:00  11.41531  38604.17
10  2022-04-30 13:07:00  11.41531  38600.00
11  2022-04-30 13:07:00  11.41531  38604.16
12  2022-04-30 13:08:00   9.25056  38604.16
13  2022-04-30 13:08:00   9.25056  38607.97
14  2022-04-30 13:08:00   9.25056  38604.16
15  2022-04-30 13:08:00   9.25056  38604.16
16  2022-04-30 13:09:00   6.83944  38604.16
17  2022-04-30 13:09:00   6.83944  38606.81
18  2022-04-30 13:09:00   6.83944  38604.16
19  2022-04-30 13:09:00   6.83944  38606.80
20  2022-04-30 13:10:00   8.55118  38606.81
21  2022-04-30 13:10:00   8.55118  38606.81
22  2022-04-30 13:10:00   8.55118  38602.00
23  2022-04-30 13:10:00   8.55118  38602.00
24  2022-04-30 13:11:00   6.45035  38602.01
25  2022-04-30 13:11:00   6.45035  38602.01
26  2022-04-30 13:11:00   6.45035  38602.00
27  2022-04-30 13:11:00   6.45035  38602.01
28  2022-04-30 13:12:00   5.33661  38602.01
29  2022-04-30 13:12:00   5.33661  38602.01
30  2022-04-30 13:12:00   5.33661  38600.28
31  2022-04-30 13:12:00   5.33661  38600.28
32  2022-04-30 13:13:00  11.11430  38600.28
33  2022-04-30 13:13:00  11.11430  38606.80
34  2022-04-30 13:13:00  11.11430  38600.28
35  2022-04-30 13:13:00  11.11430  38606.80
36  2022-04-30 13:14:00   9.40830  38606.80
37  2022-04-30 13:14:00   9.40830  38606.80
38  2022-04-30 13:14:00   9.40830  38600.56
39  2022-04-30 13:14:00   9.40830  38600.57

相关内容

  • 没有找到相关文章

最新更新