为什么"sam <-sample(1:1000,50) y<-x[sam]"它和"y<-x[sample(1:1000,50)]"不一样？

这给我带来了很多问题，我浪费了很多时间试图确定两个代码之间的错误在哪里，似乎错误只是一个小小的区别：

我的代码

Xm <-x[sample(1:1000,50)]
Ym <-y[sample(1:1000,50)]

我比较的代码(并给出了正确的结果(：

extract<-sample(1:1000,50)
Xm <-x[extract]
Ym <-y[extract]

在我的脑海里，这是一样的事情，我不知道有什么不同。我希望有人能帮助我。提前谢谢你！

如果要从中采样的向量以及采样大小小于或等于x和y中的元素数，则第一个代码将起作用。

x <- rnorm(1500) 
y <- rnorm(1500)
# will work
smplVEC <- 1:1000  # sample from this vector
n.smpl <- 50       # sample size
length(smplVEC) <= length(x)
[1] TRUE
length(smplVEC) <= length(y)
[1] TRUE    
n.smpl <= length(x)
[1] TRUE
n.smpl <= length(y)
[1] TRUE
# no error returned
x[sample(smplVEC, n.smpl)] # x[sample(1:1000, 50)]
y[sample(smplVEC, n.smpl)] # y[sample(1:1000, 50)]

# will not work
smplVEC <- 1:2000  # sample from this vector
n.smpl <- 2500     # sample size
length(smplVEC) <= length(x)
[1] FALSE
length(smplVEC) <= length(y)
[1] FALSE    
n.smpl <= length(x)
[1] FALSE
n.smpl <= length(y)
[1] FALSE
# error returned
x[sample(smplVEC, n.smpl)] # x[sample(1:2000, 2500)]
y[sample(smplVEC, n.smpl)] # y[sample(1:2000, 2500)]

因为在第一个示例中，您将调用sample两次，从而将索引更改为子集。

考虑这个例子：

x <- 1:10
y <- 11:20
x[sample(1:10, 2)]
#[1] 10  9 #10th and 9th value got subsetted
y[sample(1:10, 2)]
#[1] 13 14 #3rd and 4th value got subsetted

而在第二个例子中，您只调用CCD_ 4一次并使用它进行子集。

inds <- sample(1:10, 2)
x[inds]
#[1] 10  7 #10th and 7th value got subsetted
y[inds]
#[1] 20 17 #Same 10th and 7th value got subsetted

如果我们想要获得相同的输出，请指定一个seed，它应该给出相同的索引

x <- 1:10
y <- 11:20
set.seed(24)
x[sample(1:10, 2)]
#[1] 7 3   #7th and 3rd element
set.seed(24)
y[sample(1:10, 2)]
#[1] 17 13 #7th and 3rd element

然而，最好像在OP的第二个代码块("提取"(中那样创建一个对象

相关内容

最新更新

热门标签：