假设我正在使用R:中的iris
数据集
data(iris)
summary(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. : 4.300 Min. : 2.000 Min. : 1.000 Min. : 0.100
1st Qu.: 5.100 1st Qu.: 2.800 1st Qu.: 1.600 1st Qu.: 0.300
Median : 5.800 Median : 3.000 Median : 4.350 Median : 1.300
Mean : 5.843 Mean : 3.057 Mean : 3.758 Mean : 1.199
3rd Qu.: 6.400 3rd Qu.: 3.300 3rd Qu.: 5.100 3rd Qu.: 1.800
Max. : 7.900 Max. : 4.400 Max. : 6.900 Max. : 2.500
Species
setosa : 50
versicolor: 50
virginica : 50
我想进行一个线性回归,其中Petal.Length
是因变量,Sepal.Length
是自变量。在R中,我如何一次对每个Species
类别进行回归,得到每个测试的P、R²和F值?
使用by
。
by(iris, iris$Species, (x) summary(lm(Petal.Length ~ Sepal.Length, x)))
# iris$Species: setosa
#
# Call:
# lm(formula = Petal.Length ~ Sepal.Length, data = x)
#
# Residuals:
# Min 1Q Median 3Q Max
# -0.40856 -0.08027 -0.00856 0.11708 0.46512
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.80305 0.34388 2.335 0.0238 *
# Sepal.Length 0.13163 0.06853 1.921 0.0607 .
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.1691 on 48 degrees of freedom
# Multiple R-squared: 0.07138, Adjusted R-squared: 0.05204
# F-statistic: 3.69 on 1 and 48 DF, p-value: 0.0607
#
# ---------------------------------------------------------
# iris$Species: versicolor
#
# Call:
# lm(formula = Petal.Length ~ Sepal.Length, data = x)
#
# Residuals:
# Min 1Q Median 3Q Max
# -0.68611 -0.22827 -0.04123 0.19458 0.79607
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.18512 0.51421 0.360 0.72
# Sepal.Length 0.68647 0.08631 7.954 2.59e-10 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.3118 on 48 degrees of freedom
# Multiple R-squared: 0.5686, Adjusted R-squared: 0.5596
# F-statistic: 63.26 on 1 and 48 DF, p-value: 2.586e-10
#
# ---------------------------------------------------------
# iris$Species: virginica
#
# Call:
# lm(formula = Petal.Length ~ Sepal.Length, data = x)
#
# Residuals:
# Min 1Q Median 3Q Max
# -0.68603 -0.21104 0.06399 0.18901 0.66402
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.61047 0.41711 1.464 0.15
# Sepal.Length 0.75008 0.06303 11.901 6.3e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.2805 on 48 degrees of freedom
# Multiple R-squared: 0.7469, Adjusted R-squared: 0.7416
# F-statistic: 141.6 on 1 and 48 DF, p-value: 6.298e-16
编辑
为了详细阐述我的评论,我们可以很容易地提取所需的值
by(iris, iris$Species, (x) lm(Petal.Length ~ Sepal.Length, x)) |>
lapply((x) {
with(summary(x), c(r2=r.squared, f=fstatistic,
p=do.call(pf, c(as.list(unname(fstatistic)), lower.tail=FALSE))))
}) |> do.call(what=rbind)
# r2 f.value f.numdf f.dendf p
# setosa 0.07138289 3.689765 1 48 6.069778e-02
# versicolor 0.56858983 63.263024 1 48 2.586190e-10
# virginica 0.74688439 141.636664 1 48 6.297786e-16
如果您想提取这些值,我们可以使用
library (dplyr)
df <- iris
list_res <- df %>%
base::split (., df$Species, drop = FALSE) %>%
lapply (., function (x) {
fit <- lm(Petal.Length ~ Sepal.Length, data = x) %>%
summary ()
r <- fit$r.squared
coeffs <- fit$coefficients %>%
as_tibble ()
f <- fit$fstatistic[[1]]
list_res <- list (r, coeffs, f)
names (list_res) <- c("R-Squared", "Coefficients", "F-Value")
return (list_res)
})
它为每个回归模型返回一个包含三个对象的列表,其中包括所需的值。我把系数表留在这里,因为知道你的p值属于哪个自变量总是很好的。例如,如果您希望单独提取这些p值,我们可以使用coeffs <- fit$coefficients [,4] %>% as.list ()
。