我有一个带有DatetimeIndex的Pandas DataFrame,希望将其拆分为连续连接的行块,删除nan行。
Temperature Humidity
2020-01-01 00:00:00+00:00 20 40
2020-01-01 00:01:00+00:00 21 40
2020-01-01 00:02:00+00:00 NaN NaN
2020-01-01 00:03:00+00:00 22 41
2020-01-01 00:04:00+00:00 NaN NaN
2020-01-01 00:05:00+00:00 NaN NaN
2020-01-01 00:06:00+00:00 NaN NaN
2020-01-01 00:07:00+00:00 21 41
2020-01-01 00:08:00+00:00 21 41
2020-01-01 00:09:00+00:00 21 42
结果应该是以下三个数据帧的列表:
Temperature Humidity
2020-01-01 00:00:00+00:00 20 40
2020-01-01 00:01:00+00:00 21 40
Temperature Humidity
2020-01-01 00:03:00+00:00 22 41
Temperature Humidity
2020-01-01 00:07:00+00:00 21 41
2020-01-01 00:08:00+00:00 21 41
2020-01-01 00:09:00+00:00 21 42
有什么帮助吗?
让我们尝试使用cumsum
和isnull
创建groupby
密钥
d = {x : y for x , y in df.dropna().groupby(df.isnull().cumsum().sum(1))}
d[0]
Temperature Humidity
2020-01-0100:00:00+00:00 20.0 40.0
2020-01-0100:01:00+00:00 21.0 40.0
让我们尝试使用cumsum
来识别块:
na = df.Temperature.isna().cumsum()
for i,d in df.loc[na.eq(0) | na.duplicated()].groupby(na):
print(d)
输出:
Temperature Humidity
2020-01-01 00:00:00+00:00 20.0 40.0
2020-01-01 00:01:00+00:00 21.0 40.0
Temperature Humidity
2020-01-01 00:03:00+00:00 22.0 41.0
Temperature Humidity
2020-01-01 00:07:00+00:00 21.0 41.0
2020-01-01 00:08:00+00:00 21.0 41.0
2020-01-01 00:09:00+00:00 21.0 42.0