首先,我创建一对示例数据帧:
df = data.frame("sample1" = runif(10), "sample2" = runif(10), "sample3" = runif(10), "sample4" = runif(10))
traits = data.frame("var1" = c(rep("group1", 2), rep("group2", 2)))
rownames(traits) = colnames(df)
如果我将公式创建为文本字符串,我可以将其直接插入 lm((
> row = t(df[1,])
> ModString = "row ~ traits$var1"
> Mod = lm(as.formula(ModString))
> Mod
Call:
lm(formula = as.formula(ModString))
Coefficients:
(Intercept) traits$var1group2
0.7799 0.1788
但是,如果我尝试对parLapply做同样的事情,我会收到一个错误,指示"traits"参数没有按预期工作:
> num_cores <- detectCores() - 1
> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits){
+ row = df[i,]
+ ModString = "vector ~ traits$factor1"
+ Mod = lm(ModString)
+ return(Mod)
+ }, df = df, traits = traits)
Error in checkForRemoteErrors(val) :
9 nodes produced errors; first error: object 'traits' not found
但奇怪的是,"特征"参数正在进入我正在使用的parLapply,这似乎是关于lm((工作方式的问题。我可以很好地输入和返回"特征":
> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits){
+ row = df[i,]
+ traits2 = traits
+ ModString = "vector ~ traits$factor1"
+ return(list(traits2, row, ModString))
+ }, df = df, traits = traits)
> results
[[1]]
[[1]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[1]][[2]]
sample1 sample2 sample3 sample4
1 0.6941108 0.8656177 0.9807334 0.936609
[[1]][[3]]
[1] "vector ~ traits$factor1"
[[2]]
[[2]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[2]][[2]]
sample1 sample2 sample3 sample4
2 0.1007983 0.5599374 0.0208095 0.8082196
[[2]][[3]]
[1] "vector ~ traits$factor1"
[[3]]
[[3]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[3]][[2]]
sample1 sample2 sample3 sample4
3 0.9633059 0.7564143 0.913617 0.4179525
[[3]][[3]]
[1] "vector ~ traits$factor1"
[[4]]
[[4]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[4]][[2]]
sample1 sample2 sample3 sample4
4 0.06625104 0.390351 0.511572 0.8386714
[[4]][[3]]
[1] "vector ~ traits$factor1"
[[5]]
[[5]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[5]][[2]]
sample1 sample2 sample3 sample4
5 0.6135228 0.4926991 0.08513074 0.105647
[[5]][[3]]
[1] "vector ~ traits$factor1"
[[6]]
[[6]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[6]][[2]]
sample1 sample2 sample3 sample4
6 0.7121677 0.6554129 0.6409468 0.4906039
[[6]][[3]]
[1] "vector ~ traits$factor1"
[[7]]
[[7]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[7]][[2]]
sample1 sample2 sample3 sample4
7 0.4651641 0.546514 0.4039608 0.1758802
[[7]][[3]]
[1] "vector ~ traits$factor1"
[[8]]
[[8]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[8]][[2]]
sample1 sample2 sample3 sample4
8 0.5121237 0.4950444 0.9662431 0.6851582
[[8]][[3]]
[1] "vector ~ traits$factor1"
[[9]]
[[9]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[9]][[2]]
sample1 sample2 sample3 sample4
9 0.2486208 0.135422 0.2128657 0.7332921
[[9]][[3]]
[1] "vector ~ traits$factor1"
[[10]]
[[10]][[1]]
var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[10]][[2]]
sample1 sample2 sample3 sample4
10 0.06203028 0.7916495 0.3528376 0.2259685
[[10]][[3]]
[1] "vector ~ traits$factor1"
我在这里错过了哪些令人尴尬的琐碎细节?
我会这样做;请注意完全不同的数据组织:
library(dplyr)
library(tidyr)
library(tibble)
library(parallel)
#You seem to have rows of data that should be columns,
# this puts things in a form more suitable for work in R
df_new <- df %>%
mutate(row = 1:n()) %>%
gather(key = sample,value = val,sample1:sample4) %>%
arrange(row,sample)
#Data in rownames is not terribly useful
traits_new <- rownames_to_column(traits,"sample")
#Now we can put it all in *one* data frame
df_new <- left_join(df_new,
traits_new,
by = "sample")
#...and split it into a list representing each of the df's you
# want a lm() fit on
df_new_split <- split(df_new,df_new$row)
#Wrapper for lm with the only formula we need
fit_lm <- function(x){
lm(val ~ var1,data = x)
}
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl,df_new_split,fit_lm)
我觉得真的很傻,但我要把这个问题留下来,因为这是一个很好的例子,说明在复制粘贴和编辑多个版本的代码时很容易混淆。我没有在我的parLapply
中始终如一地使用as.formula
,也忘记将变量名称向量更改为行并转置它。
所以。以下作品只是花花公子:
require(parallel)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl, seq(1:10), function(i, df, traits){
row = t(df[i,])
ModString = "row ~ traits[,"var1"]"
Mod = lm(as.formula(ModString))
return(Mod)
}, df = df, traits = traits)