在蟒蛇中查找 nan 的连续计数



我有一个缺少值的数据帧。我想找到连续缺失值的数量及其计数。以下是我期望的示例数据和示例结果

Sample data
Timestamp            X
2018-01-02 00:00:00  6
2018-01-02 00:05:00  6
2018-01-02 00:10:00  4
2018-01-02 00:15:00  nan
2018-01-02 00:20:00  nan
2018-01-02 00:25:00  3
2018-01-02 00:30:00  4
2018-01-02 00:35:00  nan
2018-01-02 00:40:00  nan
2018-01-02 00:45:00  nan
2018-01-02 00:50:00  nan
2018-01-02 00:55:00  nan
2018-01-02 01:00:00  nan
2018-01-02 01:05:00  2
2018-01-02 01:10:00  4
2018-01-02 01:15:00  6
2018-01-02 01:20:00  6
2018-01-02 01:25:00  nan
2018-01-02 01:30:00  nan
2018-01-02 01:35:00  6
2018-01-02 01:40:00  nan
2018-01-02 01:45:00  nan
2018-01-02 01:50:00  6
2018-01-02 01:55:00  6
2018-01-02 02:00:00  nan
2018-01-02 02:05:00  nan
2018-01-02 02:10:00  nan
2018-01-02 02:15:00  3
2018-01-02 02:20:00  4

预期成果

Consecutive missing 
values range                Cases
0-2                          3
3-5                          1
6 and above                  1

首先使用Identifying consecutive NaN's with pandas的解决方案,然后过滤掉0值并使用cut作为箱,最后计数值按GroupBy.size

s = df.X.isna().groupby(df.X.notna().cumsum()).sum()
s = s[s!=0]
b = pd.cut(s, bins=[0, 2, 5, np.inf], labels=['0-2','3-5','6 and above'])
out = b.groupby(b).size().reset_index(name='Cases')
print (out)
             X  Cases
0          0-2      3
1          3-5      1
2  6 and above      1

最新更新