在一个分类列上对data.frame进行排序,但根据R中的模式进行替换



我有两种可能的数据集类型:

test6 <- data.frame(S=c("B","Z","B","Z","B","Z","B","B","B","Z","Z","Z"),w=c(1,1.2,1.3,2,0.9,0.95,1,1.5,1,1.1,0.8,1.3))
test5 <- data.frame(S=c("B","Z","B","Z","B","Z","B","B","Z","Z"),w=c(1,1.2,1.3,2,0.9,0.95,1,1.5,1.1,0.8))

我想命令他们得到最终结果,用于测试6:

S    w
1  B 1.00
3  B 1.30
5  B 0.90
2  Z 1.20
4  Z 2.00
6  Z 0.95
7  B 1.00
8  B 1.50
9  B 1.00
10 Z 1.10
11 Z 0.80
12 Z 1.30

用于测试5:

S    w
1  B 1.00
3  B 1.30
5  B 0.90
2  Z 1.20
4  Z 2.00
7  B 1.00
8  B 1.50
6  Z 0.95
9  Z 1.10
10 Z 0.80

因此,在测试6的情况下,得到一个交替排序,先是3B,然后是3Z,然后是3B,再是2Z,依此类推。我找到了一种方法:

library(groupdata2)
fold(test6, k = 2,method="n_dis",cat_col = "S") 

它创建了组,然后我可以对这些组进行排序以获得这个结果,但这只适用于test6的情况,即每个组中的S类计数相同。有人有更好、更简单的想法吗?提前感谢!

(部分答案。(

对于交替-3秒来说,这并不太难:

ind <- ave(rep(1, nrow(test6)), test6$S, FUN = function(z) (seq_along(z)-1) %/% 3)
ind
#  [1] 0 0 0 0 0 0 1 1 1 1 1 1
test6[order(ind, test6$S),]
#    S    w
# 1  B 1.00
# 3  B 1.30
# 5  B 0.90
# 2  Z 1.20
# 4  Z 2.00
# 6  Z 0.95
# 7  B 1.00
# 8  B 1.50
# 9  B 1.00
# 10 Z 1.10
# 11 Z 0.80
# 12 Z 1.30

对于test5,同样的方法很接近,但3/2分组的顺序不同:

ind <- ave(rep(1, nrow(test5)), test5$S, FUN = function(z) (seq_along(z)-1) %/% 3)
ind
#  [1] 0 0 0 0 0 0 1 1 1 1
test5[order(ind, test5$S),]
#    S    w
# 1  B 1.00
# 3  B 1.30
# 5  B 0.90
# 2  Z 1.20
# 4  Z 2.00
# 6  Z 0.95
# 7  B 1.00
# 8  B 1.50
# 9  Z 1.10
# 10 Z 0.80

您可以将cumsumrep组合使用,以获得可在order中使用的数字。

i <- test6$S == "B"
x <- integer(length(i))
x[i] <- cumsum(rep(c(2,0,0), length.out=sum(i))) - 1
x[!i] <- cumsum(rep(c(2,0,0), length.out=sum(!i)))
test6[order(x),]
#   S    w
#1  B 1.00
#3  B 1.30
#5  B 0.90
#2  Z 1.20
#4  Z 2.00
#6  Z 0.95
#7  B 1.00
#8  B 1.50
#9  B 1.00
#10 Z 1.10
#11 Z 0.80
#12 Z 1.30
i <- test5$S == "B"
x <- integer(length(i))
x[i] <- cumsum(rep(c(2,0,0,2,0), length.out=sum(i))) - 1
x[!i] <- cumsum(rep(c(2,0,2,0,0), length.out=sum(!i)))
test5[order(x),]
#   S    w
#1  B 1.00
#3  B 1.30
#5  B 0.90
#2  Z 1.20
#4  Z 2.00
#7  B 1.00
#8  B 1.50
#6  Z 0.95
#9  Z 1.10
#10 Z 0.80

最新更新