如何将DataFrame的几列重新格式化为一行?



这是我使用的数据框架的片段:

type        date       time      open     close    change  high   low     200ema 50ema
0  sixty-min  2007-06-04  09:00:00  1536.28  1534.71  -0.102  0.000 -0.259     NaN  1522.90
1  sixty-min  2007-06-04  10:00:00  1534.87  1534.79  -0.005  0.109 -0.106     NaN  1523.37
2  sixty-min  2007-06-04  11:00:00  1534.88  1536.08   0.078  0.124 -0.023     NaN  1523.87
3  sixty-min  2007-06-04  12:00:00  1536.21  1537.30   0.071  0.118 -0.036     NaN  1524.39
4  sixty-min  2007-06-04  13:00:00  1537.31  1536.23  -0.070  0.011 -0.130     NaN  1524.86
5  sixty-min  2007-06-04  14:00:00  1536.25  1536.91   0.043  0.096 -0.078     NaN  1525.33
6  sixty-min  2007-06-04  15:00:00  1536.53  1539.10   0.167  0.260  0.000     NaN  1525.87
7  sixty-min  2007-06-04  16:00:00  1539.00  1539.18   0.012  0.012  0.000     NaN  1526.39
8  sixty-min  2007-06-05  09:00:00  1539.12  1533.15  -0.389  0.000 -0.456     NaN  1526.66
9  sixty-min  2007-06-05  10:00:00  1533.16  1534.77   0.105  0.160 -0.178     NaN  1526.97

我想做的是将这个数据框编译成只有一行的数据框。它将包含以下列:

[ 'date' '60 9 open,'   '60 9 close,'   '60 9 change,'  '60 9 high',    '60 9 low', '60 9 200ema',  '60 9 50ema', 
'60 10 open',   '60 10 close',  '60 10 change', '60 10 high',   '60 10 low',    '60 10 200ema', '60 10 50ema',
'60 11 open',   '60 11 close',  '60 11 change', '60 11 high',   '60 11 low',    '60 11 200ema', '60 11 50ema',
'60 12 open',   '60 12 close',  '60 12 change', '60 12 high',   '60 12 low',    '60 12 200ema', '60 12 50ema',
'60 13 open',   '60 13 close',  '60 13 change', '60 13 high',   '60 13 low',    '60 13 200ema', '60 13 50ema',
'60 14 open',   '60 14 close',  '60 14 change', '60 14 high',   '60 14 low',    '60 14 200ema', '60 14 50ema',
'60 15 open',   '60 15 close',  '60 15 change', '60 15 high',   '60 15 low',    '60 15 200ema', '60 15 50ema',
'60 16 open',   '60 16 close',  '60 16 change', '60 16 high',   '60 16 low',    '60 16 200ema', '60 16 50ema',]

区别在于行中只有一个日期而没有类型,并且有一个基于每个单元格的数据类型/时间的标题。

您可以首先将小时提取为int并按日期分组:

df['time'] = pd.to_datetime(df['time']).dt.hour
df = df.groupby('date').agg(list)

然后为每个日期连接(沿着column/axis1)从每个列创建的数据框。最后(沿着行/axis0)连接所有日期的数据帧:

df_out = pd.concat([
pd.concat([pd.DataFrame([row[col]], index=[index],
columns=[f'60 {h} {col}' for h in row['time']])
for col in row.index[1:]], axis=1) 
for index, row in df.iterrows()
])

输出:

60 9 open  60 10 open  60 11 open  60 12 open  60 13 open  60 14 open  60 15 open  ...  60 10 50ema  60 11 50ema  60 12 50ema  60 13 50ema  60 14 50ema  60 15 50ema  60 16 50ema
2007-06-04    1536.28     1534.87     1534.88     1536.21     1537.31     1536.25     1536.53  ...      1523.37      1523.87      1524.39      1524.86      1525.33      1525.87      1526.39
2007-06-05    1539.12     1533.16         NaN         NaN         NaN         NaN         NaN  ...      1526.97          NaN          NaN          NaN          NaN          NaN          NaN

最新更新