我有一个如下格式的数据帧:
A B C D
2020-11-18 64.0 74.0 34.0 57.0
2020-11-20 NaN 71.0 NaN 58.0
2020-11-23 NaN 11.0 NaN NaN
2020-11-25 69.0 NaN NaN 0.0
2020-11-27 NaN 37.0 19.0 NaN
2020-11-29 63.0 NaN NaN 85.0
2020-12-03 NaN 73.0 NaN 49.0
2020-12-10 NaN NaN 32.0 NaN
2020-12-22 52.0 90.0 33.0 24.0
2020-12-23 NaN 96.0 NaN NaN
2020-12-28 78.0 NaN NaN 68.0
2020-12-29 17.0 70.0 NaN 16.0
2021-01-03 51.0 43.0 NaN 66.0
我想获得一个新的数据框,其中包含每个列中每个月的最后一个非nan值:
A B C D
2020-11 63.0 37.0 19.0 85.0
2020-12 17.0 70.0 33.0 16.0
我尝试按月分组,并应用lambda返回组内最大索引,如下所示:
df.loc[df.groupby(df.index.to_period('M')).apply(lambda x: x.index.max())]
收益率:
A B C D
2020-11-29 63.0 NaN NaN 85.0
2020-12-29 17.0 70.0 NaN 16.0
返回每个月最后一天出现的值,但不返回最后一个非nan值。如果某个特定月份最后一天的值是NaN,我将在这里显示NaN。相反,我只希望在该列中绝对没有特定月份的值时才显示NaN值。
使用GroupBy.last
:
df = df.groupby(df.index.to_period('M')).last()
print (df)
A B C D
2020-11 63.0 37.0 19.0 85.0
2020-12 17.0 70.0 33.0 16.0
2021-01 51.0 43.0 NaN 66.0