我正在尝试编写一个函数,该函数可以灵活地按可变数量的参数分组,并为每个子集拟合一个线性模型。输出应该是一个表,每一行都显示了broom::glass提供的分组变量和相应的lm调用结果。但我不知道如何构建输出。产生相同错误的代码如下:
library(dplyr)
library(broom)
test_fcn <- function(var1, ...) {
x <- unlist(list(...))
mtcars %>%
group_by(across(all_of(c('gear', x)))) %>%
mutate(mod = list(lm(hp ~ !!sym(var1), data = .))) %>%
summarize(broom::glance(mod))
}
test_fcn('qsec', 'cyl', 'carb')
我通过混合静态和动态变量参数来推动我的R/dplyr舒适区,所以我把它们留在这里,以防这是一个促成因素。感谢您的意见!
你就快到了。
test_fcn <- function(var1, ...) {
x <- unlist(list(...))
mtcars %>%
group_by(across(all_of(c('gear', x)))) %>%
summarise(
mod = list(lm(hp ~ !!sym(var1), data = .)),
mod = map(mod, broom::glance),
.groups = "drop")
}
test_fcn('qsec', 'cyl', 'carb') %>% unnest(mod)
## A tibble: 12 × 15
# gear cyl carb r.squared adj.r.sq…¹ sigma stati…² p.value df logLik AIC BIC devia…³ df.re…⁴
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
# 1 3 4 1 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 2 3 6 1 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 3 3 8 2 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 4 3 8 3 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 5 3 8 4 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 6 4 4 1 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 7 4 4 2 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 8 4 6 4 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
# 9 5 4 2 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
#10 5 6 6 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
#11 5 8 4 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
#12 5 8 8 0.502 0.485 49.2 30.2 5.77e-6 1 -169. 344. 348. 72633. 30
## … with 1 more variable: nobs <int>, and abbreviated variable names ¹adj.r.squared, ²statistic,
## ³deviance, ⁴df.residual
## ℹ Use `colnames()` to see all variable names
因为要将lm
适合对象存储在list
中,所以需要使用purrr::map
对条目进行循环。
您可能想将unnest
放入test_fcn
中:一个稍微紧凑一点的版本是
test_fcn <- function(var1, ...) {
x <- unlist(list(...))
mtcars %>%
group_by(across(all_of(c('gear', x)))) %>%
summarise(
mod = map(list(lm(hp ~ !!sym(var1), data = .)), broom::glance),
.groups = "drop") %>%
unnest(mod)
}
更新
直到你发表评论,我才意识到分组被忽视了。这里是一个nest
-unnest
类型的解决方案。
test_fcn <- function(var1, ...) {
x <- list(...)
mtcars %>%
group_by(across(all_of(c('gear', x)))) %>%
nest() %>%
ungroup() %>%
mutate(mod = map(
data,
~ lm(hp ~ !!sym(var1), data = .x) %>% broom::glance())) %>%
unnest(mod)
}
test_fcn('qsec', 'cyl', 'carb')
## A tibble: 12 × 16
# cyl gear carb data r.squared adj.r.s…¹ sigma statis…² p.value df logLik
# <dbl> <dbl> <dbl> <list> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 6 4 4 <tibble> 0.911 0.867 2.74e+ 0 20.5 0.0454 1 -8.32
# 2 4 4 1 <tibble> 0.525 0.287 1.15e+ 1 2.21 0.276 1 -14.1
# 3 6 3 1 <tibble> 1 NaN NaN NaN NaN 1 Inf
# 4 8 3 2 <tibble> 0.0262 -0.461 1.74e+ 1 0.0538 0.838 1 -15.7
# 5 8 3 4 <tibble> 0.869 0.825 7.48e+ 0 19.9 0.0210 1 -15.9
# 6 4 4 2 <tibble> 0.0721 -0.392 3.18e+ 1 0.155 0.732 1 -18.1
# 7 8 3 3 <tibble> 0.538 0.0769 2.63e-14 1.17 0.475 1 91.2
# 8 4 3 1 <tibble> 0 0 NaN NA NA NA Inf
# 9 4 5 2 <tibble> 1 NaN NaN NaN NaN 1 Inf
#10 8 5 4 <tibble> 0 0 NaN NA NA NA Inf
#11 6 5 6 <tibble> 0 0 NaN NA NA NA Inf
#12 8 5 8 <tibble> 0 0 NaN NA NA NA Inf
## … with 5 more variables: AIC <dbl>, BIC <dbl>, deviance <dbl>, df.residual <int>,
## nobs <int>, and abbreviated variable names ¹adj.r.squared, ²statistic
## ℹ Use `colnames()` to see all variable names
说明:tidyr::nest
将数据嵌套在list
列中(默认名称为data
(;然后,我们可以循环遍历data
条目,拟合模型,并在新列mod
中提取具有broom::glance
的模型摘要;CCD_ 14和CCD_。如果不需要,可以使用select(-data)
删除data
列。
PS。该示例从那些只有一个观察结果的组中生成一些警告(导致模型摘要中出现NA
(。