我有一个网格开放测量数据集和用于获得这些测量的工具。我想完成对数据的单向方差分析。下面是我的代码:
df<-structure(list(MeasurementTool = c("Wedge", "Wedge", "Wedge",
"Wedge", "Wedge", "Wedge", "Wedge", "Wedge", "Wedge", "Wedge",
"Wedge", "Wedge", "Wedge", "Wedge", "Wedge", "Wedge", "Wedge",
"Wedge", "Wedge", "Wedge", "Weighted Wedge", "Weighted Wedge",
"Weighted Wedge", "Weighted Wedge", "Weighted Wedge", "Weighted Wedge",
"Weighted Wedge", "Weighted Wedge", "Weighted Wedge", "Weighted Wedge",
"Weighted Wedge", "Weighted Wedge", "Weighted Wedge", "Weighted Wedge",
"Weighted Wedge", "Weighted Wedge", "Weighted Wedge", "Weighted Wedge",
"Weighted Wedge", "Weighted Wedge", "ICES Gauge", "ICES Gauge",
"ICES Gauge", "ICES Gauge", "ICES Gauge", "ICES Gauge", "ICES Gauge",
"ICES Gauge", "ICES Gauge", "ICES Gauge", "ICES Gauge", "ICES Gauge",
"ICES Gauge", "ICES Gauge", "ICES Gauge", "ICES Gauge", "ICES Gauge",
"ICES Gauge", "ICES Gauge", "ICES Gauge"),
MeshOpening = c(157L, 155L, 160L, 160L, 161L, 160L, 158L, 161L, 162L, 162L, 160L, 163L,
158L, 160L, 161L, 165L, 164L, 158L, 164L, 163L, 159L, 158L, 165L,
164L, 159L, 160L, 158L, 159L, 160L, 163L, 159L, 160L, 158L, 158L,
158L, 162L, 160L, 159L, 159L, 159L, 159L, 159L, 159L, 155L, 156L,
156L, 158L, 160L, 156L, 155L, 160L, 160L, 157L, 159L, 158L, 155L,
158L, 157L, 156L, 158L)), row.names = c(NA, -60L), class = "data.frame")
df$`MeasurementTool`<- as.factor(df$`MeasurementTool`)
group_by(df, 'MeasurementTool') %>% summarise(count = n(), mean = mean('MeshOpening', na.rm = TRUE), sd = sd('MeshOpening', na.rm = TRUE))
它给我这些警告信息:
警告消息:
1: In mean.default("MeshOpening", na。rm = TRUE):参数不是数字或逻辑:返回NA
2: In var(if (is.vector(x) || is.factor(x)) x else as.double(x)= na.rm):强制引入的NAs
你被dplyr::summarise
的工作方式绊倒了。它期望一个Rname
(又名symbol
),即字母周围没有引号:
group_by(df, 'MeasurementTool') %>% summarise(count = n(), mean = mean(MeshOpening, na.rm = TRUE), sd = sd(MeshOpening, na.rm = TRUE))
# A tibble: 1 × 4
`"MeasurementTool"` count mean sd
<chr> <int> <dbl> <dbl>
1 MeasurementTool 60 159. 2.48
在tidyverse出现之前,我们经常像您一样使用字符值名称来引用列,但许多人似乎喜欢将列名视为第一类对象,这在现在的tidyverse中是常态。
更好的办法是不仅解决错误的原因,而且得到你真正想要的:
group_by(df, MeasurementTool) %>% summarise(count = n(),
mean = mean(MeshOpening, na.rm = TRUE),
sd = sd(MeshOpening, na.rm = TRUE))
# A tibble: 3 × 4
MeasurementTool count mean sd
<fct> <int> <dbl> <dbl>
1 ICES Gauge 20 158. 1.73
2 Wedge 20 161. 2.56
3 Weighted Wedge 20 160. 2.06
如果group_by函数的第二个参数的值不能被解释为与列名匹配的值,那么它应该抛出一个错误,或者至少是一个警告。