在R中,如何将数值转换为规则间隔中最接近的数值

  • 本文关键字:规则 最接近 转换 r
  • 更新时间 :
  • 英文 :


问题

我正在寻找一种有效的方法来做到这一点:

给定向量x(您可以假设值已排序(:

x <- c(0.2, 0.8, 2.3, 5.8, 9.9, 10)

以及沿着间隔的规则间隔值的矢量y,例如沿着0到10的步长1:

y <- 0:10

如何获得向量z,其中来自x的值已映射到y:中的最接近值

> z
[1]  0  1  2  6 10 10

编辑:很明显,这个例子很简单,但我希望它适用于任何规则间隔的向量y,即不仅仅适用于步骤1的情况。

拟定解决方案的基准

library(microbenchmark)
set.seed(42)
yMin <- -6
stepSize <- 0.001
x <- rnorm(10000)
y <- seq(yMin, 6, by = stepSize)
# Onyambu's first answer.
fn1 <- function(x, y) y[max.col(-abs(outer(x, y, "-")))]
# Onyambu's second answer.
fn2 <- function(x, y) y[findInterval(x, c(-Inf, y+diff(y[1:2]) / 2, Inf))]
# Plonetheus' answer: although it works on my simple example, it does not work,
# e.g., when yMin is negative.
fn3 <- function(x, yMin, stepSize) {
z <- rep(0, length(x))
for (i in 1:length(x)) {
numSteps <- (x[i] - yMin) / stepSize # approximately how many steps do we need
if (x[i] - floor(numSteps) < ceiling(numSteps) - x[i]) { # check if we need to round up or down
z[i] <- yMin + floor(numSteps) * stepSize # edited to add yMin
}
else {
z[i] <- yMin + ceiling(numSteps) * stepSize # edited to add yMin
}
}
return(z)
}
# Thiagogpsm's answer.
fn4 <- function(x, y) sapply(x, function(x_i, y) y[which.min(abs(x_i - y))], y)
microbenchmark(
fn1(x, y),
fn2(x, y),
fn3(x, yMin, stepSize),
fn4(x, y),
times = 3L)
#> Unit: milliseconds
#>                    expr         min          lq        mean      median
#>               fn1(x, y) 5546.804339 5598.159531 6759.516597 5649.514724
#>               fn2(x, y)    1.252469    1.705517    3.695469    2.158564
#>  fn3(x, yMin, stepSize)    3.176284    3.190868   11.372397    3.205453
#>               fn4(x, y)  888.288538 1843.955232 3489.842765 2799.621925
#>           uq         max neval cld
#>  7365.872725 9082.230727     3   b
#>     4.916968    7.675373     3  a 
#>    15.470453   27.735453     3  a 
#>  4790.619879 6781.617833     3  ab
### Verdict
The second solution `fn2` in my benchmark test above, i.e., Onyambu's second answer (based on `findInterval`) is the fastest but the solution (`fn3`) proposed by Plonetheus is a close second.

一种方法可以是:

y[max.col(-abs(outer(x, y, "-")))]
[1]  0  1  2  6 10 10

例如

x1 <- c(0.01, 2.4, 1.3, 4.1, 6.2)
y1 <- c(1, 3, 5, 7, 9)

结果:

y1[max.col(-abs(outer(x1, y1, "-")))]
[1] 1 3 1 5 7

即,我们看到向量y中0.01接近1,2.4接近3,1.3接近3,4.1接近5,6.2接近7,正如预期的

如果对数据进行了排序,则可以使用函数findInterval

由于步骤相同,我们做:

y[findInterval(x, c(-Inf, y+diff(y[1:2]) / 2, Inf))]
[1]  0  1  2  6 10 10
y1[findInterval(x1, c(-Inf, y1+diff(y1[1:2])/2, Inf))]
[1] 1 3 1 5 7

一种方法是创建一个函数,为每个x_i返回z_i,并将其应用于向量:

map_to_closest <- function(x_i, y) {
y[which.min(abs(x_i - y))]
}
sapply(x, map_to_closest, y)
[1]  0  1  2  6 10 10

如果你知道y的最小值以及每一步有多大,那么我相信你可以在O(N(时间内做以下事情来解决它:

getZ <- function(x, yMin, stepSize) {
z <- rep(0, length(x))
for (i in 1:length(x)) {
numSteps <- (x[i] - yMin) / stepSize # approximately how many steps do we need
if (x[i] - floor(numSteps) < ceiling(numSteps) - x[i]) { # check if we need to round up or down
z[i] <- yMin + floor(numSteps) * stepSize # edited to add yMin
}
else {
z[i] <- yMin + ceiling(numSteps) * stepSize # edited to add yMin
}
}
return(z)
}

使用这些值,例如

x <- c(0.2, 0.8, 2.3, 5.8, 9.9, 10)
yMin <- 0
stepSize <- 0.3
print(getZ(x, yMin, stepSize))

我们得到了预期的输出:

[1] 0.0 0.6 2.1 5.7 9.9 9.9

最新更新