我有一个数据帧(df),每行有一年:
Jan Feb Mar Apr ... Aug Sep Oct Nov Dec
2021 0.852144 0.406946 0.067136 0.686585 ... 0.839023 0.803384 0.506883 0.829171 0.214810
2022 0.442388 0.296960 0.751213 0.690898 ... 0.701342 0.924836 0.534601 0.601725 0.673403
我想要重新格式化,以便有一行所有月份/年份按顺序排列(其中每行的索引如果是mmm yyyy格式),所以它看起来像:
0
Jan 2021 0.852144
Feb 2021 0.406945
Mar 2021 0.067136
Apr 2021 0.686585
... ...
Aug 2022 0.701342
Sep 2022 0.924836
Oct 2022 0.534601
Nov 2022 0.601725
Dec 2022 0.673403
我用过:
df = df.unstack().to_frame().T.sort_index(0,1).T
,但我不能得到索引格式正确的所有行(希望mmm yyyy为每一行索引-目前只返回月份为备用行)
0
Jan 2021 0.852144
2022 0.442388
Feb 2021 0.406946
2022 0.296960
Mar 2021 0.067136
2022 0.751213
Apr 2021 0.686585
2022 0.690898
... ...
Aug 2021 0.839023
2022 0.701342
Sep 2021 0.803384
2022 0.924836
Oct 2021 0.506883
2022 0.534601
Nov 2021 0.829171
2022 0.601725
Dec 2021 0.214810
2022 0.673403
如果需要平整化MultiIndex,则使用Index.map
,不需要进行转置排序:
df = df.unstack().to_frame()
df.index = df.index.map(lambda x: f'{x[0]} {x[1]}')
或者使用DataFrame.melt
:
df = df.melt(ignore_index=False, value_name=0)
df.index = df.pop('variable') + ' ' + df.index.astype(str)
print (df)
0
Jan 2021 0.852144
Jan 2022 0.442388
Feb 2021 0.406946
Feb 2022 0.296960
Mar 2021 0.067136
Mar 2022 0.751213
Apr 2021 0.686585
Apr 2022 0.690898
Aug 2021 0.839023
Aug 2022 0.701342
Sep 2021 0.803384
Sep 2022 0.924836
Oct 2021 0.506883
Oct 2022 0.534601
Nov 2021 0.829171
Nov 2022 0.601725
Dec 2021 0.214810
Dec 2022 0.673403
如果需要按日期时间排序,使用DataFrame.sort_index
和key
参数:
df = df.sort_index(key=lambda x: pd.to_datetime(x))
print (df)
0
Jan 2021 0.852144
Feb 2021 0.406946
Mar 2021 0.067136
Apr 2021 0.686585
Aug 2021 0.839023
Sep 2021 0.803384
Oct 2021 0.506883
Nov 2021 0.829171
Dec 2021 0.214810
Jan 2022 0.442388
Feb 2022 0.296960
Mar 2022 0.751213
Apr 2022 0.690898
Aug 2022 0.701342
Sep 2022 0.924836
Oct 2022 0.534601
Nov 2022 0.601725
Dec 2022 0.673403
与DataFrame.stack
相同的输出:
df = df.stack().to_frame()
#swapped levels
df.index = df.index.map(lambda x: f'{x[1]} {x[0]}')