在组内随机分配R中的整数，不进行替换

我正在做一个实验，有两个实验:实验_1和实验_2。每个实验有5个不同的处理(即1、2、3、4、5)。我们试图在组内随机分配处理。

我们希望通过抽样来做到这一点，而不是在每个组内迭代替换。我们希望这样做是为了确保我们在治疗中获得尽可能平衡的样本(例如，我们不希望最终将第一组中的4名受试者分配到治疗2，而没有人得到治疗1)。因此，如果一组有23名受试者，我们希望将受访者分成4个5人的子组和1个3人的子组。然后我们要在第一个5人的子组中随机抽样，不进行替换，每个人都被分配了一种治疗方法，在第二，第三，第四个5人的子组中做同样的事情，在最后一个3人的子组中随机抽样，不进行替换。所以我们会保证每个治疗至少分配给4个受试者，其中3个分配给5个受试者。我们希望对实验中的所有组和两种治疗方法都这样做。最终的输出看起来像这样…

         group   experiment_1   experiment_2
    [1,]     1           5             3
    [2,]     1           3             2
    [3,]     1           4             4
    [4,]     1           1             5
    [5,]     1           2             1
    [6,]     1           2             3
    [7,]     1           4             1
    [8,]     1           3             2
    [9,]     2           5             5
   [10,]     2           1             4
   [11,]     2           3             4
   [12,]     2           1             5
   [13,]     2           2             1
      .      .           .             .
      .      .           .             .
      .      .           .             .

我知道如何使用sample函数，但不确定如何在每组内不更换采样，使我们的输出符合上述程序。

我认为我们只需要洗牌样本id，看这个例子:

set.seed(124)
#prepare groups and samples(shuffled)
df <- data.frame(group=sort(rep(1:3,9)),
                  sampleID=sample(1:27,27))
#treatments repeated nrow of df
df$ex1 <- rep(c(1,2,3,4,5),ceiling(nrow(df)/5))[1:nrow(df)]
df$ex2 <- rep(c(2,3,4,5,1),ceiling(nrow(df)/5))[1:nrow(df)]
df <- df[ order(df$group,df$sampleID),]
#check treatment distribution
with(df,table(group,ex1))
#       ex1
# group 1 2 3 4 5
#     1 2 2 2 2 1
#     2 2 2 2 1 2
#     3 2 2 1 2 2
with(df,table(group,ex2))
#       ex2
# group 1 2 3 4 5
#     1 1 2 2 2 2
#     2 2 2 2 2 1
#     3 2 2 2 1 2

这个函数如何:

f <- function(n,m) {sample( c( rep(1:m,n%/%m), sample(1:m,n%%m) ), n )}

"n"为组大小，"m"为处理数。每个处理在组中必须包含至少"n %/% m"次。剩余的"n %% m"组成员的处理编号为不重复地任意分配的。向量"c(rep(1:m,n%/%m)， sample(1:m,n%%m))"包含这些处理数。最后是"样本"函数打乱了这些数字

> f(8,5)
[1] 5 3 1 5 4 2 2 1
> f(8,5)
[1] 4 5 3 4 2 2 1 1
> f(8,5)
[1] 4 2 1 5 3 5 2 3

下面是一个使用上面的函数创建数据框架的函数:

Plan <- function( groupSizes, numExp=2, numTreatment=5 )
{
  numGroups <- length(groupSizes)
  df <- data.frame( group = rep(1:numGroups,groupSizes) )
  for ( e in 1:numExp )
  {
    df <- cbind(df,unlist(lapply(groupSizes,function(n){f(n,numTreatment)})))
    colnames(df)[e+1] <- sprintf("Exp_%i", e)
  }
  return(df)
}

的例子:

> P <- Plan(c(8,23,13,19))
> P
   group Exp_1 Exp_2
1      1     4     1
2      1     1     4
3      1     2     2
4      1     2     1
5      1     3     5
6      1     5     5
7      1     1     2
8      1     3     3
9      2     5     1
10     2     2     1
11     2     5     2
12     2     1     2
13     2     2     1
14     2     1     4
15     2     3     5
16     2     5     3
17     2     2     4
18     2     5     4
19     2     2     5
20     2     1     1
21     2     4     2
22     2     3     3
23     2     4     3
24     2     2     5
25     2     3     3
26     2     5     2
27     2     1     5
28     2     3     4
29     2     4     4
30     2     4     2
31     2     4     3
32     3     2     5
33     3     5     3
34     3     5     1
35     3     5     1
36     3     2     5
37     3     4     4
38     3     1     4
39     3     3     2
40     3     3     2
41     3     3     3
42     3     1     1
43     3     4     2
44     3     4     4
45     4     5     1
46     4     3     1
47     4     1     2
48     4     1     5
49     4     3     3
50     4     3     1
51     4     4     5
52     4     2     4
53     4     5     3
54     4     2     1
55     4     4     2
56     4     2     5
57     4     4     4
58     4     5     3
59     4     5     4
60     4     1     2
61     4     2     5
62     4     3     2
63     4     4     4

检查分布:

> with(P,table(group,Exp_1))
     Exp_1
group 1 2 3 4 5
    1 2 2 2 1 1
    2 4 5 4 5 5
    3 2 2 3 3 3
    4 3 4 4 4 4
> with(P,table(group,Exp_2))
     Exp_2
group 1 2 3 4 5
    1 2 2 1 1 2
    2 4 5 5 5 4
    3 3 3 2 3 2
    4 4 4 3 4 4
>

高效实验的设计本身就是一门科学，有一些r包处理这个问题:

https://cran.r-project.org/web/views/ExperimentalDesign.html

我担心你的方法不是最优的资源，无论你如何创建样本…

但是这可能有帮助:

n <- 23
group <- sort(rep(1:5, ceiling(n/5)))[1:n]  
exp1 <- rep(NA, length(group))
for(i in 1:max(group)) {
    exp1[which(group == i)] <- sample(1:5)[1:sum(group == i)]
}

不完全确定这是否满足您的所有约束，但您可以使用randomizr包:

library(randomizr)
experiment_1 <- complete_ra(N = 23, num_arms = 5)
experiment_2 <- block_ra(experiment_1, num_arms = 5)
table(experiment_1)
table(experiment_2)
table(experiment_1, experiment_2)

产生如下输出:

> table(experiment_1)
experiment_1
T1 T2 T3 T4 T5 
 4  5  5  4  5 
> table(experiment_2)
experiment_2
T1 T2 T3 T4 T5 
 6  3  6  4  4 
> table(experiment_1, experiment_2)
            experiment_2
experiment_1 T1 T2 T3 T4 T5
          T1  2  0  1  1  0
          T2  1  1  1  1  1
          T3  1  1  1  1  1
          T4  1  0  2  0  1
          T5  1  1  1  1  1

相关内容

最新更新

热门标签：