r语言 - 在同一数据框架的另一列的范围内匹配列中的值并返回相应的行



我试图找到最好的方法来获得列Y的值落在列Z的范围内,并返回相应的行。

例如,考虑列X中的A, Y中的值为25,因此它介于20和30之间,因此W中的相应值为2。

## example dataframe
df <- data.frame(X=c("A","A","A","A","B","B","B","C","C","C"),
Y=c(25,25,25,25,35,35,35,15,15,15),
Z=c(10,20,30,40,10,20,30,10,20,30),
w=c(1,2,3,4,1,2,3,1,2,3))
#desired dataframe
df2 <- data.frame(X=c("A","B","C"),
Y=c(25,35,15),
Z=c(20,30,10),
w=c(2,3,1)) 

这个问题需要修复。假设D的值Y(5)小于最小值Z(10)在这种情况下,它将得到0。怎样才能做到呢?

df <- data.frame(X=c("A","A","A","A","B","B","B","C","C","C","D","D","D"),
Y=c(25,25,25,25,35,35,35,15,15,15,5,5,5),
Z=c(10,20,30,40,10,20,30,10,20,30,10,20,30),
w=c(1,2,3,4,1,2,3,1,2,3,1,2,3))
df2 <- data.frame(X=c("A","B","C","D"),
Y=c(25,35,15,5),
Z=c(20,30,10,0),
w=c(2,3,1,0)) 

有两个方法(根据逻辑)。一种方法是按'X'分组,得到'Z'和'Y'之间min最小abs溶质差的行。请注意,对于'X'中的'B'元素,所选的行是我们不知道Y是否在范围内的最后一行

library(dplyr)
df %>%
group_by(X) %>%
slice(which.min(abs(Z - Y))) %>%
ungroup

与产出

# A tibble: 3 × 4
X         Y     Z     w
<chr> <dbl> <dbl> <dbl>
1 A        25    20     2
2 B        35    30     3
3 C        15    10     1

或者另一种选择是在按'X'分组后创建lead列,在filter中创建逻辑向量,并使用if/else返回没有上限范围的情况,即'B'

df %>% 
group_by(X) %>%
mutate(Z1 = lead(Z, default = last(Z))) %>% 
filter({tmp <- Y > Z & Y <Z1
if(any(tmp)) tmp else row_number() == n()
}) %>%
ungroup %>% 
select(-Z1)

也许我们可以试试下面的data.table

> setDT(df)[, .SD[findInterval(Y, Z) == seq_along(w)], X]
X  Y  Z w
1: A 25 20 2
2: B 35 30 3
3: C 15 10 1

最新更新