我正在寻找一种不使用for循环的更优雅的方法。我有两个数据帧,df1和df2,格式如下:df1:
Position Sample Time Measurement Type Measurement Unique_ID
1 1 1 A 6.8 ID_1
1 1 2 A 7.2 ID_1
1 1 3 A 7.8 ID_1
1 1 4 A 9.2 ID_1
2 2 1 A 7.3 ID_2
2 2 2 A 8.5 ID_2
2 2 3 A 9.3 ID_2
2 2 4 A 9.5 ID_2
3 1 1 B 6.2 ID_3
3 1 2 B 7.0 ID_3
3 1 3 B 8.2 ID_3
3 1 4 B 8.0 ID_3
df2:
Position Sample Unique_ID Threshold
1 1 ID_1 8.3
2 2 ID_2 8.9
3 1 ID_3 7.1
我想检查df2中的每一行的测量列是否超过阈值,如果超过阈值;时间值";如果我们假设测量值之间有一条直线,并将该值添加到新列"跨阈值"中,则测量值将越过阈值
for unique_ID in df2['Unique_ID']:
y_last = 0
threshold = float(df2.loc[df2['Unique_ID'] == unique_ID, 'Threshold'].tolist()[0])
for x, y in zip(df1['Time'].tolist(), df1['Measurement'].tolist()):
if y > threshold:
val = (threshold - y_last)/(y-y_last) + x - 1 # straight line formula
break
else:
y_last = y
val = np.nan
df2.loc[df2['Unique_ID'] == unique_ID, 'Cross_threshold'] = val
结果:
Position Sample Unique_ID Threshold Cross_threshold
1 1 ID_1 8.3 3.357143
2 2 ID_2 8.9 3.785714
3 1 ID_3 7.1 1.750000
我还想检查测量值是否超过阈值一次,例如在时间2超过阈值,在时间4低于阈值。在实际情况下,每个unique_ID将有更多的测量值,如果超过阈值一次以上,则需要以不同的方式考虑,添加新列"Doublecross_plus"=True或False
谢谢!
这里有一种不同的方法。我包含了一些辅助值,用于验证。我使用上面的值创建了df1
和df2
。
# pull the threshold into df1
thresh = df2[['Unique_ID', 'Threshold']].set_index('Unique_ID').squeeze()
df1['thresh'] = df1['Unique_ID'].map(thresh)
# pull previous values onto current row
df1['time_prev'] = df1['Time'].shift(1)
df1['meas_prev'] = df1['Measurement'].shift(1)
df1['id_prev'] = df1['Unique_ID'].shift(1)
# compare current and previous measurements versus threshold...
df1['over_curr'] = df1['Measurement'] > df1['thresh']
df1['over_prev'] = df1['Measurement'].shift(1) > df1['thresh']
# ... and see if we crossed the threshold (^ is XOR) for current ID
df1['cross'] = ((df1['over_curr'] ^ df1['over_prev']) &
(df1['Unique_ID'] == df1['id_prev']))
# interpolate; must filter because this is meaningful when 'cross' is True
df1['Cross_threshold'] = (
(df1['thresh'] - df1['meas_prev']) / (df1['Measurement'] - df1['meas_prev'])
* (df1['Time'] - df1['time_prev'])
+ df1['time_prev']
)
现在过滤并打印。我转换了结果,以避免包装:
fields = ['Position', 'Sample', 'Unique_ID',
'Measurement', 'thresh', 'meas_prev',
'time_prev', 'Cross_threshold', 'Time']
mask = df1['cross']
# filter and export selected fields
print(df1.loc[mask, fields].transpose())
3 6 10
Position 1 2 3
Sample 1 2 1
Unique_ID ID_1 ID_2 ID_3
Measurement 9.2 9.3 8.2
thresh 8.3 8.9 7.1
meas_prev 7.8 8.5 7
time_prev 3 2 2
Cross_threshold 3.35714 2.5 2.08333
Time 4 3 3
我的"跨阈值"值与原始帖子中的预期结果不同。为了验证,我展示了:
- 阈值介于当前测量值和先前测量值之间;以及
- 跨阈值时间介于当前时间和上一个时间之间