如何根据r中的某些标准生成随机数据

我想根据以下条件生成300个随机数据:

Class   value
0   1-8
1   9-11
2   12-14
3   15-16
4   17-20

逻辑:当class = 0时，我希望得到1-8之间的随机数据。或者当class= 1时，我想获得9-11之间的随机数据，等等。

这给了我下面假设的表作为一个例子:


Class  Value
0   7
0   4
1   10
1   9
1   11
.   .
.   .

我想在每个类中都有相等和不等的混合

你可以这样做:

df <- data.frame(Class = sample(0:4, 300, TRUE))
df$Value <- sapply(list(1:8, 9:11, 12:14, 15:16, 17:20)[df$Class + 1],
sample, size = 1)

这将为您提供一个包含300行和每个类的适当数字的数据帧:

head(df)
#>   Class Value
#> 1     0     3
#> 2     1    10
#> 3     4    19
#> 4     2    12
#> 5     4    19
#> 6     1    10

^{创建于2022-12-30与reprex v2.0.2}

在代码中提供一些额外的灵活性，以便在采样中使用不同的概率，并具有尽可能少的硬编码值:

# load data.table
library(data.table)
# this is the original data
a = structure(list(Class = 0:4, value = c("1-8", "9-11", "12-14", 
"15-16", "17-20")), row.names = c(NA, -5L), class = c("data.table", 
"data.frame"))
# this is to replace "-" by ":", we will use that in a second
a[, value := gsub("\-", ":", value)]
# this is a vector of EQUAL probabilities
probs = rep(1/a[, uniqueN(Class)], a[, uniqueN(Class)])
# This is a vector of UNEQUAL Probabilities. If wanted, it should be 
# uncommented and adjusted manually
# probs = c(0.05, 0.1, 0.2, 0.4, 0.25)
# This is the number of Class samples wanted
numberOfSamples = 300
# This is the working horse
a[sample(.N, numberOfSamples, TRUE, prob = probs), ][, 
smpl := apply(.SD, 
1, 
function(x) sample(eval(parse(text = x)), 1)), 
.SDcols = "value"][, 
.(Class, smpl)]

这段代码的优点是什么?

如果你改变你的类，或值范围，你需要关心的唯一变化是原始数据帧(a，我称之为)
如果你想使用非均匀概率的采样，你可以设置它们，代码仍然运行。
如果你想取一个更小或更大的样本，你不必编辑你的代码，你只改变一个变量的值。

相关内容

最新更新

热门标签：