R中的lm(formula)在parLapply中的行为不同



首先,我创建一对示例数据帧:

df = data.frame("sample1" = runif(10), "sample2" = runif(10), "sample3" = runif(10), "sample4" = runif(10))
traits = data.frame("var1" = c(rep("group1", 2), rep("group2", 2)))
rownames(traits) = colnames(df)

如果我将公式创建为文本字符串,我可以将其直接插入 lm((

> row = t(df[1,])
> ModString = "row ~ traits$var1"
> Mod = lm(as.formula(ModString))
> Mod
Call:
lm(formula = as.formula(ModString))
Coefficients:
      (Intercept)  traits$var1group2  
           0.7799             0.1788  

但是,如果我尝试对parLapply做同样的事情,我会收到一个错误,指示"traits"参数没有按预期工作:

> num_cores <- detectCores() - 1
> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits){
+     row = df[i,]
+     ModString = "vector ~ traits$factor1"
+     Mod = lm(ModString)
+     return(Mod)
+ }, df = df, traits = traits)
Error in checkForRemoteErrors(val) : 
  9 nodes produced errors; first error: object 'traits' not found

但奇怪的是,"特征"参数正在进入我正在使用的parLapply,这似乎是关于lm((工作方式的问题。我可以很好地输入和返回"特征":

> cl <- makeCluster(num_cores)
> results <- parLapply(cl = cl, seq(1:10), function(i, df, traits){
+     row = df[i,]
+     traits2 = traits
+     ModString = "vector ~ traits$factor1"
+     return(list(traits2, row, ModString))
+ }, df = df, traits = traits)
> results
[[1]]
[[1]][[1]]
          var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[1]][[2]]
    sample1   sample2   sample3  sample4
1 0.6941108 0.8656177 0.9807334 0.936609
[[1]][[3]]
[1] "vector ~ traits$factor1"

[[2]]
[[2]][[1]]
          var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[2]][[2]]
    sample1   sample2   sample3   sample4
2 0.1007983 0.5599374 0.0208095 0.8082196
[[2]][[3]]
[1] "vector ~ traits$factor1"

[[3]]
[[3]][[1]]
          var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[3]][[2]]
    sample1   sample2  sample3   sample4
3 0.9633059 0.7564143 0.913617 0.4179525
[[3]][[3]]
[1] "vector ~ traits$factor1"

[[4]]
[[4]][[1]]
          var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[4]][[2]]
     sample1  sample2  sample3   sample4
4 0.06625104 0.390351 0.511572 0.8386714
[[4]][[3]]
[1] "vector ~ traits$factor1"

[[5]]
[[5]][[1]]
          var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[5]][[2]]
    sample1   sample2    sample3  sample4
5 0.6135228 0.4926991 0.08513074 0.105647
[[5]][[3]]
[1] "vector ~ traits$factor1"

[[6]]
[[6]][[1]]
          var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[6]][[2]]
    sample1   sample2   sample3   sample4
6 0.7121677 0.6554129 0.6409468 0.4906039
[[6]][[3]]
[1] "vector ~ traits$factor1"

[[7]]
[[7]][[1]]
          var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[7]][[2]]
    sample1  sample2   sample3   sample4
7 0.4651641 0.546514 0.4039608 0.1758802
[[7]][[3]]
[1] "vector ~ traits$factor1"

[[8]]
[[8]][[1]]
          var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[8]][[2]]
    sample1   sample2   sample3   sample4
8 0.5121237 0.4950444 0.9662431 0.6851582
[[8]][[3]]
[1] "vector ~ traits$factor1"

[[9]]
[[9]][[1]]
          var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[9]][[2]]
    sample1  sample2   sample3   sample4
9 0.2486208 0.135422 0.2128657 0.7332921
[[9]][[3]]
[1] "vector ~ traits$factor1"

[[10]]
[[10]][[1]]
          var1
sample1 group1
sample2 group1
sample3 group2
sample4 group2
[[10]][[2]]
      sample1   sample2   sample3   sample4
10 0.06203028 0.7916495 0.3528376 0.2259685
[[10]][[3]]
[1] "vector ~ traits$factor1"

我在这里错过了哪些令人尴尬的琐碎细节?

我会这样做;请注意完全不同的数据组织:

library(dplyr)
library(tidyr)
library(tibble)
library(parallel)
#You seem to have rows of data that should be columns,
# this puts things in a form more suitable for work in R
df_new <- df %>% 
    mutate(row = 1:n()) %>% 
    gather(key = sample,value = val,sample1:sample4) %>% 
    arrange(row,sample)
#Data in rownames is not terribly useful
traits_new <- rownames_to_column(traits,"sample")
#Now we can put it all in *one* data frame
df_new <- left_join(df_new,
                    traits_new,
                    by = "sample")
#...and split it into a list representing each of the df's you
# want a lm() fit on
df_new_split <- split(df_new,df_new$row)
#Wrapper for lm with the only formula we need
fit_lm <- function(x){
    lm(val ~ var1,data = x)
}
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl,df_new_split,fit_lm)
好吧,

我觉得真的很傻,但我要把这个问题留下来,因为这是一个很好的例子,说明在复制粘贴和编辑多个版本的代码时很容易混淆。我没有在我的parLapply中始终如一地使用as.formula,也忘记将变量名称向量更改为行并转置它。

所以。以下作品只是花花公子:

require(parallel)
num_cores <- detectCores() - 1
cl <- makeCluster(num_cores)
results <- parLapply(cl = cl, seq(1:10), function(i, df, traits){
    row = t(df[i,])
    ModString = "row ~ traits[,"var1"]"
    Mod = lm(as.formula(ModString))
    return(Mod)
}, df = df, traits = traits)

最新更新