根据条件在pandas Dataframe的行中查找连续值



我正在看这个问题:我如何在pandas Dataframe中找到连续的5行,其中某列的值至少为0.5,这与我想到的类似。我想找到至少3个连续的行,其中一个值小于0.5(但不是负的也不是nan),同时考虑整个数据框架,而不仅仅是一个列在上面链接的问题。这里有一个传真数据帧:

from random import uniform
idx = pd.date_range("2018-01-01", periods=10, freq="M")
df = pd.DataFrame(
{
'A':[0, 0.4, 0.5, 0.3, 0,0,0,0,0,0],
'B':[0, 0.6, 0.8,0, 0.3, 0.3, 0.9, 0.7,0,0],
'C':[0,0,0.5, 0.4, 0.4, 0.2,0,0,0,0],
'D':[0.4,0, 0.6, 0.5, 0.7, 0.2,0, 0.9, 0.8,0],
'E':[0.4, 0.3, 0.2, 0.7, 0.7, 0.8,0,0,0,0],
'F':[0,0,0.6, 0.7,0.8, 0.3, 0.4, 0.1,0,0]
},
index=idx
)
df = df.replace({0:np.nan})
df

因此,由于B列和D列不满足条件,应该从输出中删除。

我不喜欢使用for循环之类的,因为它是一个2000列的df,因此我尝试了以下操作:

def consecutive_values_in_range(s, min, max):
return s.between(left=min, right=max)
min, max = 0, 0.5
df.apply(lambda col: consecutive_values_in_range(col, min, max), axis=0)
print(df)

但是我没有得到我想要的东西,那应该是这样的:

A     C   E   F
2018-01-31  NaN NaN 0.4 NaN
2018-02-28  0.4 NaN 0.3 NaN
2018-03-31  0.5 0.5 0.2 0.6
2018-04-30  0.3 0.4 0.7 0.7
2018-05-31  NaN 0.4 0.7 0.8
2018-06-30  NaN 0.2 0.8 0.3
2018-07-31  NaN NaN NaN 0.4
2018-08-31  NaN NaN NaN 0.1
2018-09-30  NaN NaN NaN NaN
2018-10-31  NaN NaN NaN NaN

有什么建议吗?提前谢谢。

lower, upper = 0, 0.5
n = 3
df.loc[:, ((df <= upper) & (df >= lower)).rolling(n).sum().eq(n).any()]
  • 获取df上的is_between掩码
  • 获取每列这些蒙版的滚动总和,窗口大小为3
  • 因为True == 1, False == 0,如果我们在任意点得到3,这意味着连续3个True,即0 <= val <= 0.5
  • 所以检查3的相等性,看看是否有任何在列
  • 最后一个索引,每个列的True/False掩码

,

A    C    E    F
2018-01-31  NaN  NaN  0.4  NaN
2018-02-28  0.4  NaN  0.3  NaN
2018-03-31  0.5  0.5  0.2  0.6
2018-04-30  0.3  0.4  0.7  0.7
2018-05-31  NaN  0.4  0.7  0.8
2018-06-30  NaN  0.2  0.8  0.3
2018-07-31  NaN  NaN  NaN  0.4
2018-08-31  NaN  NaN  NaN  0.1
2018-09-30  NaN  NaN  NaN  NaN
2018-10-31  NaN  NaN  NaN  NaN

最新更新