所以,我有两个dataframes
——有点大(df1~=20k行&df2~=150万(——我想检查df1
中的值是否在df2$low & df2$high
之间,但有条件地执行(以限制检查次数(,并且只在abs(df1$val-df2$val) < 2
时执行检查。如果发现df1中的值在df2范围内,则将其添加到具有TRUE/FALSE
值的新列中。
df1
weight | |||
---|---|---|---|
94.99610 | 95.00561 |
使用与data.table
的非等联接-将第一个数据转换为data.table(setDT
(,创建filter
列作为逻辑(FALSE
(值。进行非等联接,并将(:=
(filter
分配给TRUE
,仅当条件(abs(weight - th_weight) < 2
(满足时,才将FALSE
更改为TRUE
library(data.table)
setDT(df1)[, filter := FALSE]
df1[df2, filter := abs(weight - th_weight) < 2,
on = .(low <= th_weight, high >= th_weight)]
-输出
> df1
weight low high filter
<num> <num> <num> <lgcl>
1: 94.99610 94.99608 94.99613 TRUE
2: 95.00561 95.00558 95.00566 FALSE
数据
df1 <- structure(list(weight = c(94.9961, 95.00561), low = c(94.99608,
95.00558), high = c(94.99613, 95.00566)), class = "data.frame", row.names = c(NA,
-2L))
df2 <- structure(list(index = 1:5, th_weight = c(94.996092, 95.496336,
95.509906, 97.473292, 100.51906)), class = "data.frame", row.names = c(NA,
-5L))