我试图找到最好的方法来获得列Y的值落在列Z的范围内,并返回相应的行。
例如,考虑列X中的A, Y中的值为25,因此它介于20和30之间,因此W中的相应值为2。
## example dataframe
df <- data.frame(X=c("A","A","A","A","B","B","B","C","C","C"),
Y=c(25,25,25,25,35,35,35,15,15,15),
Z=c(10,20,30,40,10,20,30,10,20,30),
w=c(1,2,3,4,1,2,3,1,2,3))
#desired dataframe
df2 <- data.frame(X=c("A","B","C"),
Y=c(25,35,15),
Z=c(20,30,10),
w=c(2,3,1))
这个问题需要修复。假设D的值Y(5)小于最小值Z(10)在这种情况下,它将得到0。怎样才能做到呢?
df <- data.frame(X=c("A","A","A","A","B","B","B","C","C","C","D","D","D"),
Y=c(25,25,25,25,35,35,35,15,15,15,5,5,5),
Z=c(10,20,30,40,10,20,30,10,20,30,10,20,30),
w=c(1,2,3,4,1,2,3,1,2,3,1,2,3))
df2 <- data.frame(X=c("A","B","C","D"),
Y=c(25,35,15,5),
Z=c(20,30,10,0),
w=c(2,3,1,0))
有两个方法(根据逻辑)。一种方法是按'X'分组,得到'Z'和'Y'之间min
最小abs
溶质差的行。请注意,对于'X'中的'B'元素,所选的行是我们不知道Y是否在范围内的最后一行
library(dplyr)
df %>%
group_by(X) %>%
slice(which.min(abs(Z - Y))) %>%
ungroup
与产出
# A tibble: 3 × 4
X Y Z w
<chr> <dbl> <dbl> <dbl>
1 A 25 20 2
2 B 35 30 3
3 C 15 10 1
或者另一种选择是在按'X'分组后创建lead
列,在filter
中创建逻辑向量,并使用if/else
返回没有上限范围的情况,即'B'
df %>%
group_by(X) %>%
mutate(Z1 = lead(Z, default = last(Z))) %>%
filter({tmp <- Y > Z & Y <Z1
if(any(tmp)) tmp else row_number() == n()
}) %>%
ungroup %>%
select(-Z1)
也许我们可以试试下面的data.table
> setDT(df)[, .SD[findInterval(Y, Z) == seq_along(w)], X]
X Y Z w
1: A 25 20 2
2: B 35 30 3
3: C 15 10 1