R 中的 Kruskal-wallis 测试给出了一个错误:model.frame.default 中的错误:可变长度不



我正在尝试在 R 中的示例数据帧 (df) 中对多列运行 Kruskal wallis 测试,但我遇到了以下错误:

Error in model.frame.default(formula = as.numeric(x) ~ as.factor(Groups),  : 
variable lengths differ (found for 'as.factor(Groups)') 

这是我的示例数据帧 (df):

Groups  Gene1   Gene2   Gene3   Gene4   Gene5   Gene6   Gene7   Gene8   Gene9   Gene10
Group1  120.67  69.33   1.24    2.31    0.39    6.57    2.49    383.84  415.23  NA
Group1  157 110.67  0.4 0.84    0.28    2.62    2.11    245.42  325.23  NA
Group1  113.5   66.75   1.07    4.53    0.33    2.37    2.35    421.25  352.03  73.51
Group1  131 79.67   1.13    5.03    0.72    3.36    2.24    305.32  432.81  71.11
Group1  120 79.67   0.91    3.84    0.74    3.77    1.92    298.91  382.43  66.49
Group2  125.67  83.67   2.07    1.73    0.38    3.89    2.09    233.81  377.21  72.1
Group2  103.33  68.67   1.01    4.89    0.3 4.5 1.75    231.5   381.73  53
Group2  121.33  74.67   0.54    2.39    3.95    3.7 2.46    310.66  355.97  143.61
Group2  136 83.67   1.6 1.75    0.32    5.17    2.36    410.21  389.62  170.34
Group2  143.67  71.33   0.56    1.22    0.26    4.48    2.62    294.01  491.57  96.72
Group2  134.67  69.67   0.85    1.77    0.45    3.58    2.44    236.61  441.32  69.06
Group2  158.33  98.33   0.87    3.69    0.51    2.53    2.6 257.66  396.96  41.94
Group2  147.33  88.33   NA  NA  NA  NA  NA  NA  NA  NA
Group2  95.67   59  1.39    0.56    0.31    2.49    2.09    395.38  420.28  64.83
Group3  135 82  13.31   24.05   1.21    3.83    2.83    313.71  327.84  66.8
Group3  124.67  78  1.12    2   0.71    3.77    2.42    334.36  358.9   131.35
Group3  152 98.33   1.11    1.54    0.35    2.11    2.21    297.68  433.48  117.18
Group3  135.33  73.67   0.13    2.99    0.3 2.4 1.86    296.82  415.13  112.97
Group3  135.33  87  0.91    3.73    0.65    2.92    1.85    335.31  412.16  103.18
Group4  124.67  77.67   0.28    0.81    0.49    2.62    1.96    251.49  468.19  80.27
Group4  125.67  72.33   1.01    1.82    0.35    3.65    1.62    335.18  264.74  145.15
Group4  169 105 0.6 3.12    0.29    3.9 2.22    311.01  459.85  82.89
Group4  123.67  76.33   0.65    1.78    0.47    2.77    1.57    253.56  283.38  59.07
Group5  132.67  76.33   2.94    17.01   0.27    3.99    2.55    354.78  493.02  145.36
Group5  NA  NA  1.34    1.42    0.4 4.21    2.02    243.26  345.2   43.91
Group5  144.33  75  NA  NA  0.55    3.26    2.85    312.16  419.86  55.71
Group5  136.25  78.25   NA  1.32    0.65    3.63    1.52    267.13  256.18  53.49
Group5  123.67  69.33   1.81    1.52    0.67    3.89    2   303.89  346.57  112.16
Group5  116.67  66.33   0.7 1.68    0.27    3.55    2.16    284.96  407.04  102.97
Group5  136.67  76  2.68    4.3 0.33    7.36    2.26    237.28  423.29  88.65
Group6  122 63.33   0.87    4.2 0.17    3.92    2.11    159.04  300.24  60.13
Group6  130.67  82.67   0.8 1.85    1   5.26    2.46    388.61  558.51  66.76
Group6  136.33  70.33   0.54    2.26    0.35    NA  NA  388.81  551.69  113.39
Group6  127.33  73  1.32    2.19    0.99    4.42    2.59    378.57  501.12  85.56
Group7  186.67  89.67   0.79    1.77    0.53    5.22    2.73    269.87  490.25  77.74
Group7  203 93  5.63    22.08   0.82    6.97    2.92    341.87  611.33  92.7
Group7  127 72.67   0.55    1.07    0.38    3.2 1.69    310.9   410.19  65.62
Group7  142 79.67   1.61    1.35    3.24    3.73    2.08    304.52  495.79  60.15

这是我的代码:

kw.tests <- lapply(
data[, -1],
function(x) { kruskal.test(as.numeric(x) ~ as.factor(Groups), data = data_test, na.action=na.omit) }
)
Error in model.frame.default(formula = as.numeric(x) ~ as.factor(Groups),  : 
variable lengths differ (found for 'as.factor(Groups)') 

当我单独运行每个基因时,此代码可以完美运行,例如,对于 Gene1:

kruskal.test(Gene1 ~ as.factor(Groups), data = data_test, na.action=na.omit)
Kruskal-Wallis rank sum test
data:  Gene1 by as.factor(Groups)
Kruskal-Wallis chi-squared = 5.6607, df = 6, p-value = 0.4622

但是,当我使用 lapply 甚至 for 循环时,它会给我这个错误。我已经用谷歌搜索了几次这个错误,但以下答案都没有帮助我。

  1. 我了解到这可能是由于文件中的 NA。但是,我无法避免 NA,因为我的数据帧比这大得多。此外,即使有 NA,该测试也为每个基因单独完美运行,没有 lapply 或循环。
  2. "Groups"变量的变量长度
  3. 与所有其他变量的变量长度相同,因此这也不是问题。

我在这里发布我的数据片段:

> dput(data_test)
structure(list(Groups = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L), .Label = c("Group1", 
"Group2", "Group3", "Group4", "Group5", "Group6", "Group7"), class = "factor"), 
Gene1 = c(120.67, 157, 113.5, 131, 120, 125.67, 103.33, 121.33, 
136, 143.67, 134.67, 158.33, 147.33, 95.67, 135, 124.67, 
152, 135.33, 135.33, 124.67, 125.67, 169, 123.67, 132.67, 
NA, 144.33, 136.25, 123.67, 116.67, 136.67, 122, 130.67, 
136.33, 127.33, 186.67, 203, 127, 142), Gene2 = c(69.33, 
110.67, 66.75, 79.67, 79.67, 83.67, 68.67, 74.67, 83.67, 
71.33, 69.67, 98.33, 88.33, 59, 82, 78, 98.33, 73.67, 87, 
77.67, 72.33, 105, 76.33, 76.33, NA, 75, 78.25, 69.33, 66.33, 
76, 63.33, 82.67, 70.33, 73, 89.67, 93, 72.67, 79.67), Gene3 = c(1.24, 
0.4, 1.07, 1.13, 0.91, 2.07, 1.01, 0.54, 1.6, 0.56, 0.85, 
0.87, NA, 1.39, 13.31, 1.12, 1.11, 0.13, 0.91, 0.28, 1.01, 
0.6, 0.65, 2.94, 1.34, NA, NA, 1.81, 0.7, 2.68, 0.87, 0.8, 
0.54, 1.32, 0.79, 5.63, 0.55, 1.61), Gene4 = c(2.31, 0.84, 
4.53, 5.03, 3.84, 1.73, 4.89, 2.39, 1.75, 1.22, 1.77, 3.69, 
NA, 0.56, 24.05, 2, 1.54, 2.99, 3.73, 0.81, 1.82, 3.12, 1.78, 
17.01, 1.42, NA, 1.32, 1.52, 1.68, 4.3, 4.2, 1.85, 2.26, 
2.19, 1.77, 22.08, 1.07, 1.35), Gene5 = c(0.39, 0.28, 0.33, 
0.72, 0.74, 0.38, 0.3, 3.95, 0.32, 0.26, 0.45, 0.51, NA, 
0.31, 1.21, 0.71, 0.35, 0.3, 0.65, 0.49, 0.35, 0.29, 0.47, 
0.27, 0.4, 0.55, 0.65, 0.67, 0.27, 0.33, 0.17, 1, 0.35, 0.99, 
0.53, 0.82, 0.38, 3.24), Gene6 = c(6.57, 2.62, 2.37, 3.36, 
3.77, 3.89, 4.5, 3.7, 5.17, 4.48, 3.58, 2.53, NA, 2.49, 3.83, 
3.77, 2.11, 2.4, 2.92, 2.62, 3.65, 3.9, 2.77, 3.99, 4.21, 
3.26, 3.63, 3.89, 3.55, 7.36, 3.92, 5.26, NA, 4.42, 5.22, 
6.97, 3.2, 3.73), Gene7 = c(2.49, 2.11, 2.35, 2.24, 1.92, 
2.09, 1.75, 2.46, 2.36, 2.62, 2.44, 2.6, NA, 2.09, 2.83, 
2.42, 2.21, 1.86, 1.85, 1.96, 1.62, 2.22, 1.57, 2.55, 2.02, 
2.85, 1.52, 2, 2.16, 2.26, 2.11, 2.46, NA, 2.59, 2.73, 2.92, 
1.69, 2.08), Gene8 = c(383.84, 245.42, 421.25, 305.32, 298.91, 
233.81, 231.5, 310.66, 410.21, 294.01, 236.61, 257.66, NA, 
395.38, 313.71, 334.36, 297.68, 296.82, 335.31, 251.49, 335.18, 
311.01, 253.56, 354.78, 243.26, 312.16, 267.13, 303.89, 284.96, 
237.28, 159.04, 388.61, 388.81, 378.57, 269.87, 341.87, 310.9, 
304.52), Gene9 = c(415.23, 325.23, 352.03, 432.81, 382.43, 
377.21, 381.73, 355.97, 389.62, 491.57, 441.32, 396.96, NA, 
420.28, 327.84, 358.9, 433.48, 415.13, 412.16, 468.19, 264.74, 
459.85, 283.38, 493.02, 345.2, 419.86, 256.18, 346.57, 407.04, 
423.29, 300.24, 558.51, 551.69, 501.12, 490.25, 611.33, 410.19, 
495.79), Gene10 = c(NA, NA, 73.51, 71.11, 66.49, 72.1, 53, 
143.61, 170.34, 96.72, 69.06, 41.94, NA, 64.83, 66.8, 131.35, 
117.18, 112.97, 103.18, 80.27, 145.15, 82.89, 59.07, 145.36, 
43.91, 55.71, 53.49, 112.16, 102.97, 88.65, 60.13, 66.76, 
113.39, 85.56, 77.74, 92.7, 65.62, 60.15)), class = "data.frame", row.names = c(NA, 
-38L))

任何进一步的帮助表示赞赏。 谢谢你。

您在 lapply/apply 调用中使用了错误的数据集名称

apply(data_test[,-1],2,function(x){kruskal.test(as.numeric(x)~as.factor(data_test$Groups))})

为我工作。

相关内容

最新更新