蟒蛇熊猫.如何在特定条件行之前包含行



我有一个csv文件,它看起来有点像。像这样:

<156>2
持续时间
1.2 1.25
10.3 0 11
10.5 10.6 10.67 30
1.75 11

您可以使用布尔掩码进行布尔索引:

# number of baseline rows to keep
n = 2
# cols to keep
cols = ['duration', 'measurement', 'concentration']
# is the concentration greater than 10?
m1 = dF1['concentration'].gt(10)
# is the row one of the n initial concentration 0?
m2 = dF1['concentration'].eq(0).cumsum().le(n)
# if you have values in between 0 and 10 and do not want those
# m2 = (m2:=dF1['concentration'].eq(0)) & m2.cumsum().le(n)
# or
# m2 = df.index.isin(dF1[dF1['concentration'].eq(0)].head(n).index)
# keep rows where either condition is met
dF2 = dF1.loc[m1|m2, cols]

如果您只想在第一个值高于阈值之前保留初始行,请将m2更改为:

# keep up to n initial rows with concentration=0
# only until the first row above threshold is met
m2 = dF1['concentration'].eq(0).cumsum().le(n) & ~m1.cummax()

输出:

duration  measurement  concentration
0      1.20         10.0              0
1      1.25         12.0              0
4     10.60        150.0             20
5     10.67        156.0             30

您可以过滤记录并连接以获得所需的结果

n = 100 # No of initial rows with concentratin 0 required
dF2 = pd.concat([dF1[dF1["concentration"]==0].head(n),dF1[dF1["concentration"]>10]])[["duration","measurement","concentration"]]

当浓度为零时,您可以简单地过滤数据帧,并使用"head"从过滤的数据帧中选择前100行或前n行,并将其附加到dF2中。

n = 100 # you can change this to include the number of rows you want.
df_baseline = dF1[dF1["concentration"] == 0][["duration","measurement","concentration"]].head(n)

dF2 = dF1[dF1["concentration"]>10][["duration","measurement","concentration"]]
df_final = df_baseline.append(df2)

最新更新