如何返回每个月每个列中最后一个非nan值的数据框?



我有一个如下格式的数据帧:

A     B     C     D
2020-11-18  64.0  74.0  34.0  57.0
2020-11-20   NaN  71.0   NaN  58.0
2020-11-23   NaN  11.0   NaN   NaN
2020-11-25  69.0   NaN   NaN   0.0
2020-11-27   NaN  37.0  19.0   NaN
2020-11-29  63.0   NaN   NaN  85.0
2020-12-03   NaN  73.0   NaN  49.0
2020-12-10   NaN   NaN  32.0   NaN
2020-12-22  52.0  90.0  33.0  24.0
2020-12-23   NaN  96.0   NaN   NaN
2020-12-28  78.0   NaN   NaN  68.0
2020-12-29  17.0  70.0   NaN  16.0
2021-01-03  51.0  43.0   NaN  66.0

我想获得一个新的数据框,其中包含每个列中每个月的最后一个非nan值:

A     B     C     D
2020-11     63.0  37.0  19.0  85.0
2020-12     17.0  70.0  33.0  16.0

我尝试按月分组,并应用lambda返回组内最大索引,如下所示:

df.loc[df.groupby(df.index.to_period('M')).apply(lambda x: x.index.max())]

收益率:

A     B     C     D
2020-11-29  63.0   NaN   NaN  85.0
2020-12-29  17.0  70.0   NaN  16.0

返回每个月最后一天出现的值,但不返回最后一个非nan值。如果某个特定月份最后一天的值是NaN,我将在这里显示NaN。相反,我只希望在该列中绝对没有特定月份的值时才显示NaN值。

使用GroupBy.last:

df = df.groupby(df.index.to_period('M')).last()
print (df)
A     B     C     D
2020-11  63.0  37.0  19.0  85.0
2020-12  17.0  70.0  33.0  16.0
2021-01  51.0  43.0   NaN  66.0

最新更新