这是我使用的数据框架的片段:
type date time open close change high low 200ema 50ema
0 sixty-min 2007-06-04 09:00:00 1536.28 1534.71 -0.102 0.000 -0.259 NaN 1522.90
1 sixty-min 2007-06-04 10:00:00 1534.87 1534.79 -0.005 0.109 -0.106 NaN 1523.37
2 sixty-min 2007-06-04 11:00:00 1534.88 1536.08 0.078 0.124 -0.023 NaN 1523.87
3 sixty-min 2007-06-04 12:00:00 1536.21 1537.30 0.071 0.118 -0.036 NaN 1524.39
4 sixty-min 2007-06-04 13:00:00 1537.31 1536.23 -0.070 0.011 -0.130 NaN 1524.86
5 sixty-min 2007-06-04 14:00:00 1536.25 1536.91 0.043 0.096 -0.078 NaN 1525.33
6 sixty-min 2007-06-04 15:00:00 1536.53 1539.10 0.167 0.260 0.000 NaN 1525.87
7 sixty-min 2007-06-04 16:00:00 1539.00 1539.18 0.012 0.012 0.000 NaN 1526.39
8 sixty-min 2007-06-05 09:00:00 1539.12 1533.15 -0.389 0.000 -0.456 NaN 1526.66
9 sixty-min 2007-06-05 10:00:00 1533.16 1534.77 0.105 0.160 -0.178 NaN 1526.97
我想做的是将这个数据框编译成只有一行的数据框。它将包含以下列:
[ 'date' '60 9 open,' '60 9 close,' '60 9 change,' '60 9 high', '60 9 low', '60 9 200ema', '60 9 50ema',
'60 10 open', '60 10 close', '60 10 change', '60 10 high', '60 10 low', '60 10 200ema', '60 10 50ema',
'60 11 open', '60 11 close', '60 11 change', '60 11 high', '60 11 low', '60 11 200ema', '60 11 50ema',
'60 12 open', '60 12 close', '60 12 change', '60 12 high', '60 12 low', '60 12 200ema', '60 12 50ema',
'60 13 open', '60 13 close', '60 13 change', '60 13 high', '60 13 low', '60 13 200ema', '60 13 50ema',
'60 14 open', '60 14 close', '60 14 change', '60 14 high', '60 14 low', '60 14 200ema', '60 14 50ema',
'60 15 open', '60 15 close', '60 15 change', '60 15 high', '60 15 low', '60 15 200ema', '60 15 50ema',
'60 16 open', '60 16 close', '60 16 change', '60 16 high', '60 16 low', '60 16 200ema', '60 16 50ema',]
区别在于行中只有一个日期而没有类型,并且有一个基于每个单元格的数据类型/时间的标题。
您可以首先将小时提取为int并按日期分组:
df['time'] = pd.to_datetime(df['time']).dt.hour
df = df.groupby('date').agg(list)
然后为每个日期连接(沿着column/axis1)从每个列创建的数据框。最后(沿着行/axis0)连接所有日期的数据帧:
df_out = pd.concat([
pd.concat([pd.DataFrame([row[col]], index=[index],
columns=[f'60 {h} {col}' for h in row['time']])
for col in row.index[1:]], axis=1)
for index, row in df.iterrows()
])
输出:
60 9 open 60 10 open 60 11 open 60 12 open 60 13 open 60 14 open 60 15 open ... 60 10 50ema 60 11 50ema 60 12 50ema 60 13 50ema 60 14 50ema 60 15 50ema 60 16 50ema
2007-06-04 1536.28 1534.87 1534.88 1536.21 1537.31 1536.25 1536.53 ... 1523.37 1523.87 1524.39 1524.86 1525.33 1525.87 1526.39
2007-06-05 1539.12 1533.16 NaN NaN NaN NaN NaN ... 1526.97 NaN NaN NaN NaN NaN NaN