r-循环如何帮助将数据集中到最大值



我正在尝试设计一些代码来遍历我的数据,并将值集中在每列的最大值上。

这里有一个我想做的事情的例子:

输入

|  A   |   B  |       
|:-----| ----:|
|   4  |   3  |
|   5  |   2  |
|   9  |   4  |
|   8  |   7  |
|   4  |   5  |
|   3  |   3  | 
|   2  |   1  |

输出

|  A   |   B  |       
|:-----|-----:|
|   4  |   4  |
|   5  |   4  |
|   9  |   7  |
|   8  |   5  |
|   4  |   3  |

按照建议,我现在使用的这段代码对于上面的例子来说非常好。但它仍然给我的数据带来了问题。我得到的错误是:

(_max-4(中的错误:(_max+4(:长度为0 的参数

有什么帮助吗?我完全被卡住了

df_cent<-apply(df, 2, function(x) {
the_max<-which.max(x == max(x))
return(x[(the_max-4):(the_max+4)])
})

真实数据

dput(raw[1:20,1:5]) structure(list(Y0 = c(3145.126, 3178.701, 3224.385, 3304.599, 3427.954, 3564.216, 3663.065, 3607.685, 3416.442, 3213.872, 3082.273, 2967.31, 2914.054, 2902.385, 2879.799, 2863.839, 2845.718, 2833.797, 2811.662, 2778.558), Y1 = c(2678.572, 2647.732, 2624.185, 2617.655, 2589.248, 2559.836, 2520.349, 2484.969, 2469.404, 2472.38, 2486.179, 2495.08, 2505.582, 2524.076, 2526.301, 2536.212, 2514.524, 2470.91, 2425.193, 2407.115), Y2 = c(2782.993, 2801.221, 2849.327, 2887.829, 2862.908, 2882.687, 2926.137, 2910.612, 2928.439, 2942.857, 2949.042, 3007.03, 3025.96, 3028.522, 3019.542, 3006.743, 3020.229, 3023.875, 2985.96, 2944.298), Y3 = c(2451.421, 2454.053, 2448.346, 2430.966, 2425.783, 2429.053, 2416.686, 2393.618, 2378.365, 2356.911, 2371.982, 2381.778, 2385.626, 2378.868, 2363.729, 2352.621, 2349.481, 2374.857, 2374.877, 2354.132), Y4 = c(2350.779, 2361.946, 2354.645, 2339.802, 2257.112, 2230.763, 2235.095, 2212.157, 2200.369, 2199.146, 2162.409, 2147.56, 2118.352, 2111.032, 2122.665, 2111.456, 2082.912, 2071.944, 2075.322, 2068.664)), row.names = c(NA, 20L), class = "data.frame")

向量的示例:

x = sample(c(4,5,9,8,4))
x.ordered = sort(x)
middle = floor(length(x)/2)
below = x.ordered[1:middle]
above = rev(x.ordered[(middle+1):length(x)])
new.x = c(x.ordered[1:middle], rev(x.ordered[(middle+1):length(x)]))

取高于或低于最大值的n个值:

above[1:n]
below[-n:-1]

这个想法是检测中间位置在哪里,然后使用有序向量及其反向(函数rev(

如注释中所述,您可以简化它。在每列上使用apply,确定max,并在列中包含+/-指定宽度的值(在本例中,n_width为2(。结果是一个矩阵,但如果需要,可以进行转换。

注意:如果列中缺少值NA,请确保在确定max值时包含na.rm = TRUE

n_width <- 2
apply(df, 2, function(x) {
the_max <- which.max(x == max(x, na.rm = T))
if (the_max < n_width + 1) {
return (c(rep(NA, n_width - the_max + 1),
x[1:(the_max + n_width)]))
} else {
return(x[(the_max - n_width):(the_max + n_width)])
}
})

输出

A B
[1,] 4 2
[2,] 5 4
[3,] 9 7
[4,] 8 5
[5,] 4 3

编辑:对于第二个数据集,如果在最大值之前的观测值少于n_width,则需要NAs。例如:

n_width <- 5

输出

Y0       Y1       Y2       Y3       Y4
[1,] 3178.701       NA 2928.439       NA       NA
[2,] 3224.385       NA 2942.857       NA       NA
[3,] 3304.599       NA 2949.042       NA       NA
[4,] 3427.954       NA 3007.030       NA       NA
[5,] 3564.216       NA 3025.960 2451.421 2350.779
[6,] 3663.065 2678.572 3028.522 2454.053 2361.946
[7,] 3607.685 2647.732 3019.542 2448.346 2354.645
[8,] 3416.442 2624.185 3006.743 2430.966 2339.802
[9,] 3213.872 2617.655 3020.229 2425.783 2257.112
[10,] 3082.273 2589.248 3023.875 2429.053 2230.763
[11,] 2967.310 2559.836 2985.960 2416.686 2235.095

数据

# First data set
df <- structure(list(A = c(4, 5, 9, 8, 4, 3, 2), B = c(3, 2, 4, 7, 
5, 3, 1)), class = "data.frame", row.names = c(NA, -7L))
# Second data set
df <- structure(list(Y0 = c(3145.126, 3178.701, 3224.385, 3304.599, 
3427.954, 3564.216, 3663.065, 3607.685, 3416.442, 3213.872, 3082.273, 
2967.31, 2914.054, 2902.385, 2879.799, 2863.839, 2845.718, 2833.797, 
2811.662, 2778.558), Y1 = c(2678.572, 2647.732, 2624.185, 2617.655, 
2589.248, 2559.836, 2520.349, 2484.969, 2469.404, 2472.38, 2486.179, 
2495.08, 2505.582, 2524.076, 2526.301, 2536.212, 2514.524, 2470.91, 
2425.193, 2407.115), Y2 = c(2782.993, 2801.221, 2849.327, 2887.829, 
2862.908, 2882.687, 2926.137, 2910.612, 2928.439, 2942.857, 2949.042, 
3007.03, 3025.96, 3028.522, 3019.542, 3006.743, 3020.229, 3023.875, 
2985.96, 2944.298), Y3 = c(2451.421, 2454.053, 2448.346, 2430.966, 
2425.783, 2429.053, 2416.686, 2393.618, 2378.365, 2356.911, 2371.982, 
2381.778, 2385.626, 2378.868, 2363.729, 2352.621, 2349.481, 2374.857, 
2374.877, 2354.132), Y4 = c(2350.779, 2361.946, 2354.645, 2339.802, 
2257.112, 2230.763, 2235.095, 2212.157, 2200.369, 2199.146, 2162.409, 
2147.56, 2118.352, 2111.032, 2122.665, 2111.456, 2082.912, 2071.944, 
2075.322, 2068.664)), row.names = c(NA, 20L), class = "data.frame")

最新更新