我有一个csv文件,它看起来有点像。像这样:
持续时间 | ||||||
---|---|---|---|---|---|---|
1.2 | 1.25 | |||||
10.3 | 0 | 11 | ||||
10.5 | 10.6 | 10.67 | 30 | <156>|||
1.75 | 11 | 2|||||
您可以使用布尔掩码进行布尔索引:
# number of baseline rows to keep
n = 2
# cols to keep
cols = ['duration', 'measurement', 'concentration']
# is the concentration greater than 10?
m1 = dF1['concentration'].gt(10)
# is the row one of the n initial concentration 0?
m2 = dF1['concentration'].eq(0).cumsum().le(n)
# if you have values in between 0 and 10 and do not want those
# m2 = (m2:=dF1['concentration'].eq(0)) & m2.cumsum().le(n)
# or
# m2 = df.index.isin(dF1[dF1['concentration'].eq(0)].head(n).index)
# keep rows where either condition is met
dF2 = dF1.loc[m1|m2, cols]
如果您只想在第一个值高于阈值之前保留初始行,请将m2
更改为:
# keep up to n initial rows with concentration=0
# only until the first row above threshold is met
m2 = dF1['concentration'].eq(0).cumsum().le(n) & ~m1.cummax()
输出:
duration measurement concentration
0 1.20 10.0 0
1 1.25 12.0 0
4 10.60 150.0 20
5 10.67 156.0 30
您可以过滤记录并连接以获得所需的结果
n = 100 # No of initial rows with concentratin 0 required
dF2 = pd.concat([dF1[dF1["concentration"]==0].head(n),dF1[dF1["concentration"]>10]])[["duration","measurement","concentration"]]
当浓度为零时,您可以简单地过滤数据帧,并使用"head"从过滤的数据帧中选择前100行或前n行,并将其附加到dF2中。
n = 100 # you can change this to include the number of rows you want.
df_baseline = dF1[dF1["concentration"] == 0][["duration","measurement","concentration"]].head(n)
dF2 = dF1[dF1["concentration"]>10][["duration","measurement","concentration"]]
df_final = df_baseline.append(df2)