在R中工作,使用熟悉的彩色球袋类比。我想从一袋球中不更换样品。在这个袋子里,有许多不同颜色的球。每种颜色c在袋子k中表示随机次数(例如,k_blue = 3, k_red = 5, k_green = 2,…)。对于每个c,我想取一个不替换尺寸k的样本,每个球都从循环末端的袋子中取出。
I've attempt here:
library(tidyverse)
# Generate data
data <- tibble(Colour = paste0("c", 1:1000),
k = sample(x = c(1:10), size = 1000, replace = T))
# Fill the 'bag' with balls of colour 'C', 'k' times
bag <- unlist(map2(.x = data$Colour,
.y = data$k,
.f = ~rep(x = .x, times = .y)))
data.2 <- data %>%
mutate(Grouped_Colours = map2(.x = Colour, .y = k,
.f = ~{
# Take sample of size k without repeatedly sampling the
# same colour (unique() / replace = F) and without
# including each colour in its own group (bag[bag != .x])
.samp <- sample(unique(bag[bag != .x]), size = .y, replace = F)
### EXCLUDE ALL PREVIOUSLY SAMPLED BALLS (ALSO
### EXCLUDING THOSE FROM PREVIOUS LOOPS)
bag <- bag[-match(.samp, bag)]
# print out the sample and mutate it into the
# new column 'Grouped_Colours'
.samp
})
)
我用大写标示了我认为代码会导致不希望的结果的地方。最后,我需要对整个袋子进行采样(即,袋子应该是空的,最后)。
目前的问题是,在最终数据集中,球被采样的次数不正确,每种颜色应该被采样恰好k次。
谢谢你,如果有什么不清楚的,请告诉我。
编辑2022-06-03
我已经尝试在if
语句中包装此采样过程,以将初始采样与所有后续提取分开。现在循环不能找到之前采样的球从袋子中排除。
data.2 <- data %>%
mutate(Grouped_Colours = pmap(.l = list(..1 = Colour, ..2 = k, ..3 = seq_along(Colour)),
# On the first iteration, sample from the global variable 'bag'
.f = ~{ if (..3 == 1) {
# Take sample of size k without repeatedly sampling the
# same colour (unique() / replace = F) and without
# including each colour in its own group (bag[bag != .x])
.samp <- sample(unique(bag[bag != ..1]), size = ..2, replace = F)
# On every subsequent iteration, sample from the function
# environment variable 'bag' and overwrite the contents
} else {
### EXCLUDE ALL PREVIOUSLY SAMPLED BALLS (ALSO
### EXCLUDING THOSE FROM PREVIOUS LOOPS)
bag <- bag[-match(.samp, bag)]
.samp <- sample(unique(bag[bag != ..1]), size = ..2, replace = F)
}
})
)
也许我误解了,但这似乎是一个非常简单的问题。
set.seed(1) # for reproducibility
bag <- rep(c('R', 'B', 'G'), sample(10, 3, replace=TRUE))
bag
## [1] "R" "R" "R" "R" "R" "R" "R" "R" "R" "B" "B" "B" "B" "G" "G" "G" "G" "G" "G" "G"
sample(bag, replace=FALSE) # random sample of all balls, without replacement
## [1] "R" "R" "B" "G" "G" "G" "R" "G" "R" "B" "B" "R" "R" "G" "B" "G" "R" "R" "G" "R"
第一行创建了三种颜色的包,每种颜色随机出现一次(1:10)。第二行样品袋不更换,直到它"空"。