我正在尝试构建一个完全由1和0组成的数据帧。它应该是随机构建的,除了每个列需要加起来等于一个指定的值。
如果这只是一个数据帧,我会知道如何做到这一点,但它需要被构建到一个函数中,在这个函数中,它将作为一个迭代过程来完成,高达1000倍。
一种有效的方法是对每个列使用适当数量的1和0对向量进行洗牌。您可以定义以下函数来生成一个矩阵,该矩阵具有指定的行数和每列中的1个数:
build.mat <- function(nrow, csums) {
sapply(csums, function(x) sample(rep(c(0, 1), c(nrow-x, x))))
}
set.seed(144)
build.mat(5, 0:5)
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] 0 0 0 0 1 1
# [2,] 0 0 0 1 0 1
# [3,] 0 0 0 0 1 1
# [4,] 0 1 1 1 1 1
# [5,] 0 0 1 1 1 1
要构建列表,可以对每个矩阵的列和使用lapply
:
cslist <- list(1:3, c(4, 2))
set.seed(144)
lapply(cslist, build.mat, nrow=5)
# [[1]]
# [,1] [,2] [,3]
# [1,] 0 1 1
# [2,] 0 0 0
# [3,] 0 0 0
# [4,] 0 1 1
# [5,] 1 0 1
#
# [[2]]
# [,1] [,2]
# [1,] 0 0
# [2,] 1 0
# [3,] 1 1
# [4,] 1 0
# [5,] 1 1
如果0比1多,或者相反,@akrun的方法可能更快:
build_01_mat <- function(n,n1s){
nc <- length(n1s)
zerofirst <- sum(n1s) < n*nc/2
tochange <- if (zerofirst) n1s else n-n1s
mat <- matrix(if (zerofirst) 0L else 1L,n,nc)
mat[cbind(
unlist(c(sapply((1:nc)[tochange>0],function(col)sample(1:n,tochange[col])))),
rep(1:nc,tochange)
)] <- if (zerofirst) 1L else 0L
mat
}
set.seed(1)
build_01_mat(5,c(1,3,0))
# [,1] [,2] [,3]
# [1,] 0 0 0
# [2,] 1 1 0
# [3,] 0 1 0
# [4,] 0 1 0
# [5,] 0 0 0
一些基准:
require(rbenchmark)
# similar numbers of zeros and ones
benchmark(
permute=build.mat(1e7,1e7/2),
replace=build_01_mat(1e7,1e7/2),replications=10)[1:5]
# test replications elapsed relative user.self
# 1 permute 10 7.68 1.126 6.59
# 2 replace 10 6.82 1.000 6.27
# many more zeros than ones
benchmark(
permute=build.mat(1e6,rep(10,20)),
replace=build_01_mat(1e6,rep(10,20)),replications=10)[1:5]
# test replications elapsed relative user.self
# 1 permute 10 10.28 3.779 8.51
# 2 replace 10 2.72 1.000 2.23
# many more ones than zeros
benchmark(
permute=build.mat(1e6,1e6-rep(10,20)),
replace=build_01_mat(1e6,1e6-rep(10,20)),replications=10)[1:5]
# test replications elapsed relative user.self
# 1 permute 10 10.94 4.341 9.28
# 2 replace 10 2.52 1.000 2.09