在数据框df
中,我构造了一个函数f
,它计算x.sample和y.sample之间的相关性。然后,我试图运行999个随机化,计算per
中每个随机化的预期相关性。我不确定这里的lapply是否写得正确,如果它实际上是在per
函数中。如果我计算的是per
,而不是4个sp
上的任何其他函数,那么验证这一点的简单方法是什么?
set.seed(111)
library(truncnorm)
x <- rtruncnorm(n = 288,a = 0,b = 10,mean = 5,sd = 2)
v <- rtruncnorm(n = 288,a = 0,b = 10,mean = 5,sd = 2)
y <- ((v/x^2) - (1/x))
sp <- rep(c("A","B","C","D"), each = 72)
df <- data.frame(v,x,y,sp)
library(data.table)
setDT(df)
# function to estimate model coefficients
f <- function(x,v) {x.sample <- sample(x, length(x), replace=T)
y.sample <- (v/x.sample^2) - (1/x.sample)
per <- cor(y.sample, x.sample)}
set.seed(1234)
# 999 models for each species
result = rbindlist(
lapply(1:999, (i) df[,.(est = f(x,v)), sp][, i:=i])
)
我对data.table
不是很熟悉,所以lapply()
调用中的符号看起来很外国,但在玩弄它之后,我认为它正在做你想做的。
检查的一种方法是用不同的方式编码,并直观地比较结果:
library(dplyr)
out_list <- list()
for(i in 1:999){
df2 <- df %>%
group_by(sp) %>%
mutate(est = f(x, v)) %>%
select(sp, est) %>%
distinct()
out_list[[i]] <- df2
}
out_df <- bind_rows(out_list, .id = "id")
library(ggplot2)
p1 <- ggplot() +
geom_histogram(data = out_df, mapping = aes(x = est, color = sp, fill = sp), show.legend = FALSE) +
facet_wrap(~sp)
p2 <- ggplot() +
geom_histogram(data = result, mapping = aes(x = est, color = sp, fill = sp), show.legend = FALSE) +
facet_wrap(~sp)
gridExtra::grid.arrange(p1, p2, ncol = 2)