r语言 - Using mutate(across(...)) with purrr::map



我很难弄清楚如何将purrr::map()mutate(across(...))一起使用。

我想做一个线性模型,并得出由单个列预测的多个列的斜率的估计值。

以下是我尝试的一个示例数据集:

mtcars %>%
mutate(across(-mpg), 
map(.x, lst(slope = ~lm(.x ~ mpg, data = .x) %>% 
tidy() %>% 
filter(term != "(Intercept") %>% 
pull(estimate)
)))

我正在寻找的输出将是每个非mpg列的新列,名称后附加_slope,即cyl_slope

在我的实际数据中,我也将按另一个变量进行分组,以备不时之需,因为我需要每个预测变量的每组斜率。我在一个标准的变异中工作,一次做一个变量,如下所示:

df %>% 
group_by(unitid) %>% 
nest() %>% 
mutate(tuition_and_fees_as_pct_total_rev_slope = map_dbl(data, ~lm(tuition_and_fees_as_pct_total_rev ~ year, data = .x) %>%
tidy() %>%
filter(term == "year") %>%
pull(estimate)
))

因此:

  1. 我认为我的问题是如何将预测的列名传递到lm
  2. 我不知道该解决方案是否需要嵌套,所以如果在mtcars示例中考虑嵌套,我将不胜感激

如果我们想在自变量为"mpg"的所有其他列上执行lm,一个选项是循环遍历"mtcars"(除了"mpg’(的列名,使用reformulate创建公式,应用lm,转换为tidy格式,filter输出"Intercept",select输出"estimate"列

library(dplyr)
library(tidyr)
library(broom)
map_dfc(setdiff(names(mtcars), 'mpg'), ~ 
lm(reformulate('mpg', response = .x), data = mtcars) %>%
tidy %>% 
filter(term != "(Intercept)") %>%
select(estimate))

-输出

# A tibble: 1 x 10
#   estimate...1 estimate...2 estimate...3 estimate...4 estimate...5 estimate...6 estimate...7 estimate...8 estimate...9 estimate...10
#      <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>         <dbl>
#1       -0.253        -17.4        -8.83       0.0604       -0.141        0.124       0.0555       0.0497       0.0588        -0.148

或者使用matrix作为依赖可以更容易地完成

library(stringr)
lm(as.matrix(mtcars[setdiff(names(mtcars), "mpg")]) ~ mpg, 
data = mtcars) %>% 
tidy %>% 
filter(term != "(Intercept)") %>%
select(response, estimate) %>%
mutate(response = str_c(response, '_slope'))

-输出

# A tibble: 10 x 2
#   response   estimate
#   <chr>         <dbl>
# 1 cyl_slope   -0.253 
# 2 disp_slope -17.4   
# 3 hp_slope    -8.83  
# 4 drat_slope   0.0604
# 5 wt_slope    -0.141 
# 6 qsec_slope   0.124 
# 7 vs_slope     0.0555
# 8 am_slope     0.0497
# 9 gear_slope   0.0588
#10 carb_slope  -0.148 

或者另一个选项是带有acrosssummarise

mtcars %>%
summarise(across(-mpg, ~ list(lm(reformulate('mpg', 
response = cur_column())) %>%
tidy %>%
filter(term != "(Intercept)") %>%
pull(estimate)), .names = "{.col}_slope")) %>%
unnest(everything())
# A tibble: 1 x 10
#  cyl_slope disp_slope hp_slope drat_slope wt_slope qsec_slope vs_slope am_slope gear_slope carb_slope
#      <dbl>      <dbl>    <dbl>      <dbl>    <dbl>      <dbl>    <dbl>    <dbl>      <dbl>      <dbl>
#1    -0.253      -17.4    -8.83     0.0604   -0.141      0.124   0.0555   0.0497     0.0588     -0.148

一个选项可以是:

map_dfr(.x = names(select(mtcars, -c(mpg, vs))),
~ mtcars %>%
group_by(vs) %>%
nest() %>%
mutate(variable = .x,
estimate = map_dbl(data, function(y) lm(!!sym(.x) ~ mpg, data = y) %>% 
tidy() %>%
filter(term != "(Intercept)") %>%
pull(estimate))) %>%
select(-data))
vs variable estimate
<dbl> <chr>       <dbl>
1     0 cyl       -0.242 
2     1 cyl       -0.116 
3     0 disp     -22.5   
4     1 disp      -8.01  
5     0 hp       -10.1   
6     1 hp        -3.26  
7     0 drat       0.0748
8     1 drat       0.0529
9     0 wt        -0.192 
10     1 wt        -0.113 
11     0 qsec      -0.0357
12     1 qsec      -0.0432
13     0 am         0.0742
14     1 am         0.0710
15     0 gear       0.114 
16     1 gear       0.0492
17     0 carb      -0.0883
18     1 carb      -0.0790

最新更新