在R回路中重复取样而不更换

  • 本文关键字:回路 r loops resampling pmap
  • 更新时间 :
  • 英文 :


在R中工作,使用熟悉的彩色球袋类比。我想从一袋球中不更换样品。在这个袋子里,有许多不同颜色的球。每种颜色c在袋子k中表示随机次数(例如,k_blue = 3, k_red = 5, k_green = 2,…)。对于每个c,我想取一个不替换尺寸k的样本,每个球都从循环末端的袋子中取出。

I've attempt here:

library(tidyverse)
# Generate data
data <- tibble(Colour = paste0("c", 1:1000),
k = sample(x = c(1:10), size = 1000, replace = T))
# Fill the 'bag' with balls of colour 'C', 'k' times
bag <- unlist(map2(.x = data$Colour,
.y = data$k,
.f = ~rep(x = .x, times = .y)))

data.2 <- data %>% 
mutate(Grouped_Colours = map2(.x = Colour, .y = k,
.f = ~{
# Take sample of size k without repeatedly sampling the
# same colour (unique() / replace = F) and without
# including each colour in its own group (bag[bag != .x])
.samp <- sample(unique(bag[bag != .x]), size = .y, replace = F)
### EXCLUDE ALL PREVIOUSLY SAMPLED BALLS (ALSO
### EXCLUDING THOSE FROM PREVIOUS LOOPS)
bag <- bag[-match(.samp, bag)]
# print out the sample and mutate it into the
# new column 'Grouped_Colours'
.samp
})
)

我用大写标示了我认为代码会导致不希望的结果的地方。最后,我需要对整个袋子进行采样(即,袋子应该是空的,最后)。

目前的问题是,在最终数据集中,球被采样的次数不正确,每种颜色应该被采样恰好k次。

谢谢你,如果有什么不清楚的,请告诉我。

编辑2022-06-03

我已经尝试在if语句中包装此采样过程,以将初始采样与所有后续提取分开。现在循环不能找到之前采样的球从袋子中排除。

data.2 <- data %>% 
mutate(Grouped_Colours = pmap(.l = list(..1 = Colour, ..2 = k, ..3 = seq_along(Colour)),
# On the first iteration, sample from the global variable 'bag'
.f = ~{ if (..3  == 1) {
# Take sample of size k without repeatedly sampling the
# same colour (unique() / replace = F) and without
# including each colour in its own group (bag[bag != .x])
.samp <- sample(unique(bag[bag != ..1]), size = ..2, replace = F)

# On every subsequent iteration, sample from the function 
# environment variable 'bag' and overwrite the contents
} else {
### EXCLUDE ALL PREVIOUSLY SAMPLED BALLS (ALSO
### EXCLUDING THOSE FROM PREVIOUS LOOPS)
bag <- bag[-match(.samp, bag)]
.samp <- sample(unique(bag[bag != ..1]), size = ..2, replace = F)
}
})
)

也许我误解了,但这似乎是一个非常简单的问题。

set.seed(1)   # for reproducibility
bag <- rep(c('R', 'B', 'G'), sample(10, 3, replace=TRUE))
bag
##  [1] "R" "R" "R" "R" "R" "R" "R" "R" "R" "B" "B" "B" "B" "G" "G" "G" "G" "G" "G" "G"
sample(bag, replace=FALSE)   # random sample of all balls, without replacement
##  [1] "R" "R" "B" "G" "G" "G" "R" "G" "R" "B" "B" "R" "R" "G" "B" "G" "R" "R" "G" "R"

第一行创建了三种颜色的包,每种颜色随机出现一次(1:10)。第二行样品袋不更换,直到它"空"。

最新更新