我正在尝试设计一些代码来遍历我的数据,并将值集中在每列的最大值上。
这里有一个我想做的事情的例子:
输入
| A | B |
|:-----| ----:|
| 4 | 3 |
| 5 | 2 |
| 9 | 4 |
| 8 | 7 |
| 4 | 5 |
| 3 | 3 |
| 2 | 1 |
输出
| A | B |
|:-----|-----:|
| 4 | 4 |
| 5 | 4 |
| 9 | 7 |
| 8 | 5 |
| 4 | 3 |
按照建议,我现在使用的这段代码对于上面的例子来说非常好。但它仍然给我的数据带来了问题。我得到的错误是:
(_max-4(中的错误:(_max+4(:长度为0 的参数
有什么帮助吗?我完全被卡住了
df_cent<-apply(df, 2, function(x) {
the_max<-which.max(x == max(x))
return(x[(the_max-4):(the_max+4)])
})
真实数据
dput(raw[1:20,1:5]) structure(list(Y0 = c(3145.126, 3178.701, 3224.385, 3304.599, 3427.954, 3564.216, 3663.065, 3607.685, 3416.442, 3213.872, 3082.273, 2967.31, 2914.054, 2902.385, 2879.799, 2863.839, 2845.718, 2833.797, 2811.662, 2778.558), Y1 = c(2678.572, 2647.732, 2624.185, 2617.655, 2589.248, 2559.836, 2520.349, 2484.969, 2469.404, 2472.38, 2486.179, 2495.08, 2505.582, 2524.076, 2526.301, 2536.212, 2514.524, 2470.91, 2425.193, 2407.115), Y2 = c(2782.993, 2801.221, 2849.327, 2887.829, 2862.908, 2882.687, 2926.137, 2910.612, 2928.439, 2942.857, 2949.042, 3007.03, 3025.96, 3028.522, 3019.542, 3006.743, 3020.229, 3023.875, 2985.96, 2944.298), Y3 = c(2451.421, 2454.053, 2448.346, 2430.966, 2425.783, 2429.053, 2416.686, 2393.618, 2378.365, 2356.911, 2371.982, 2381.778, 2385.626, 2378.868, 2363.729, 2352.621, 2349.481, 2374.857, 2374.877, 2354.132), Y4 = c(2350.779, 2361.946, 2354.645, 2339.802, 2257.112, 2230.763, 2235.095, 2212.157, 2200.369, 2199.146, 2162.409, 2147.56, 2118.352, 2111.032, 2122.665, 2111.456, 2082.912, 2071.944, 2075.322, 2068.664)), row.names = c(NA, 20L), class = "data.frame")
向量的示例:
x = sample(c(4,5,9,8,4))
x.ordered = sort(x)
middle = floor(length(x)/2)
below = x.ordered[1:middle]
above = rev(x.ordered[(middle+1):length(x)])
new.x = c(x.ordered[1:middle], rev(x.ordered[(middle+1):length(x)]))
取高于或低于最大值的n个值:
above[1:n]
below[-n:-1]
这个想法是检测中间位置在哪里,然后使用有序向量及其反向(函数rev
(
如注释中所述,您可以简化它。在每列上使用apply
,确定max
,并在列中包含+/-指定宽度的值(在本例中,n_width
为2(。结果是一个矩阵,但如果需要,可以进行转换。
注意:如果列中缺少值NA
,请确保在确定max
值时包含na.rm = TRUE
。
n_width <- 2
apply(df, 2, function(x) {
the_max <- which.max(x == max(x, na.rm = T))
if (the_max < n_width + 1) {
return (c(rep(NA, n_width - the_max + 1),
x[1:(the_max + n_width)]))
} else {
return(x[(the_max - n_width):(the_max + n_width)])
}
})
输出
A B
[1,] 4 2
[2,] 5 4
[3,] 9 7
[4,] 8 5
[5,] 4 3
编辑:对于第二个数据集,如果在最大值之前的观测值少于n_width
,则需要NA
s。例如:
n_width <- 5
输出
Y0 Y1 Y2 Y3 Y4
[1,] 3178.701 NA 2928.439 NA NA
[2,] 3224.385 NA 2942.857 NA NA
[3,] 3304.599 NA 2949.042 NA NA
[4,] 3427.954 NA 3007.030 NA NA
[5,] 3564.216 NA 3025.960 2451.421 2350.779
[6,] 3663.065 2678.572 3028.522 2454.053 2361.946
[7,] 3607.685 2647.732 3019.542 2448.346 2354.645
[8,] 3416.442 2624.185 3006.743 2430.966 2339.802
[9,] 3213.872 2617.655 3020.229 2425.783 2257.112
[10,] 3082.273 2589.248 3023.875 2429.053 2230.763
[11,] 2967.310 2559.836 2985.960 2416.686 2235.095
数据
# First data set
df <- structure(list(A = c(4, 5, 9, 8, 4, 3, 2), B = c(3, 2, 4, 7,
5, 3, 1)), class = "data.frame", row.names = c(NA, -7L))
# Second data set
df <- structure(list(Y0 = c(3145.126, 3178.701, 3224.385, 3304.599,
3427.954, 3564.216, 3663.065, 3607.685, 3416.442, 3213.872, 3082.273,
2967.31, 2914.054, 2902.385, 2879.799, 2863.839, 2845.718, 2833.797,
2811.662, 2778.558), Y1 = c(2678.572, 2647.732, 2624.185, 2617.655,
2589.248, 2559.836, 2520.349, 2484.969, 2469.404, 2472.38, 2486.179,
2495.08, 2505.582, 2524.076, 2526.301, 2536.212, 2514.524, 2470.91,
2425.193, 2407.115), Y2 = c(2782.993, 2801.221, 2849.327, 2887.829,
2862.908, 2882.687, 2926.137, 2910.612, 2928.439, 2942.857, 2949.042,
3007.03, 3025.96, 3028.522, 3019.542, 3006.743, 3020.229, 3023.875,
2985.96, 2944.298), Y3 = c(2451.421, 2454.053, 2448.346, 2430.966,
2425.783, 2429.053, 2416.686, 2393.618, 2378.365, 2356.911, 2371.982,
2381.778, 2385.626, 2378.868, 2363.729, 2352.621, 2349.481, 2374.857,
2374.877, 2354.132), Y4 = c(2350.779, 2361.946, 2354.645, 2339.802,
2257.112, 2230.763, 2235.095, 2212.157, 2200.369, 2199.146, 2162.409,
2147.56, 2118.352, 2111.032, 2122.665, 2111.456, 2082.912, 2071.944,
2075.322, 2068.664)), row.names = c(NA, 20L), class = "data.frame")