我正在看这个问题:我如何在pandas Dataframe中找到连续的5行,其中某列的值至少为0.5,这与我想到的类似。我想找到至少3个连续的行,其中一个值小于0.5(但不是负的也不是nan),同时考虑整个数据框架,而不仅仅是一个列在上面链接的问题。这里有一个传真数据帧:
from random import uniform
idx = pd.date_range("2018-01-01", periods=10, freq="M")
df = pd.DataFrame(
{
'A':[0, 0.4, 0.5, 0.3, 0,0,0,0,0,0],
'B':[0, 0.6, 0.8,0, 0.3, 0.3, 0.9, 0.7,0,0],
'C':[0,0,0.5, 0.4, 0.4, 0.2,0,0,0,0],
'D':[0.4,0, 0.6, 0.5, 0.7, 0.2,0, 0.9, 0.8,0],
'E':[0.4, 0.3, 0.2, 0.7, 0.7, 0.8,0,0,0,0],
'F':[0,0,0.6, 0.7,0.8, 0.3, 0.4, 0.1,0,0]
},
index=idx
)
df = df.replace({0:np.nan})
df
因此,由于B列和D列不满足条件,应该从输出中删除。
我不喜欢使用for循环之类的,因为它是一个2000列的df,因此我尝试了以下操作:
def consecutive_values_in_range(s, min, max):
return s.between(left=min, right=max)
min, max = 0, 0.5
df.apply(lambda col: consecutive_values_in_range(col, min, max), axis=0)
print(df)
但是我没有得到我想要的东西,那应该是这样的:
A C E F
2018-01-31 NaN NaN 0.4 NaN
2018-02-28 0.4 NaN 0.3 NaN
2018-03-31 0.5 0.5 0.2 0.6
2018-04-30 0.3 0.4 0.7 0.7
2018-05-31 NaN 0.4 0.7 0.8
2018-06-30 NaN 0.2 0.8 0.3
2018-07-31 NaN NaN NaN 0.4
2018-08-31 NaN NaN NaN 0.1
2018-09-30 NaN NaN NaN NaN
2018-10-31 NaN NaN NaN NaN
有什么建议吗?提前谢谢。
lower, upper = 0, 0.5
n = 3
df.loc[:, ((df <= upper) & (df >= lower)).rolling(n).sum().eq(n).any()]
- 获取
df
上的is_between掩码 - 获取每列这些蒙版的滚动总和,窗口大小为3
- 因为True == 1, False == 0,如果我们在任意点得到3,这意味着连续3个True,即0 <= val <= 0.5
- 所以检查3的相等性,看看是否有任何在列
- 最后一个索引,每个列的True/False掩码
,
A C E F
2018-01-31 NaN NaN 0.4 NaN
2018-02-28 0.4 NaN 0.3 NaN
2018-03-31 0.5 0.5 0.2 0.6
2018-04-30 0.3 0.4 0.7 0.7
2018-05-31 NaN 0.4 0.7 0.8
2018-06-30 NaN 0.2 0.8 0.3
2018-07-31 NaN NaN NaN 0.4
2018-08-31 NaN NaN NaN 0.1
2018-09-30 NaN NaN NaN NaN
2018-10-31 NaN NaN NaN NaN