r语言 - 按列前缀嵌套代码



按行进行正常的嵌套分组。我的不一样。我想创建一个按列前缀(在第一个'_'之前)分组的嵌套标题,在嵌套标题中保留原始的列名。当前的方法有效,但看起来过于复杂。

tibble(a_1=1:3, a_2=2:4, b_1=3:5) %>% 
print() %>%
#  A tibble: 3 x 3
#     a_1   a_2   b_1
#   <int> <int> <int>
# 1     1     2     3
# 2     2     3     4
# 3     3     4     5
pivot_longer(everything()) %>% 
nest(data=-name) %>% 
mutate(data=map2(data, name, ~rename(.x, '{.y}' := value))) %>% 
mutate(gr=str_extract(name, '^[^_]+'), .keep='unused') %>% 
nest(data=-gr) %>% 
mutate(data=map(data, ~bind_cols(.[[1]]))) %>%
print() %>%
# A tibble: 2 x 2
#   gr    data            
#   <chr> <list>          
# 1  a     <tibble [3 x 2]>
# 2  b     <tibble [3 x 1]>
{ .$data[[1]] }
# A tibble: 3 x 2
#     a_1   a_2
#   <int> <int>
# 1     1     2
# 2     2     3
# 3     3     4

UPD: if possible, tidyverse solution

使用我最近学到的一个小技巧,你可以这样做:

library(tidyr)
library(dplyr, warn = FALSE)
tibble(a_1 = 1:3, a_2 = 2:4, b_1 = 3:5) %>%
split.default(., gsub("_[0-9]", "", names(.))) %>%
lapply(nest, data = everything()) %>%
bind_rows(.id = "gr")
#> # A tibble: 2 × 2
#>   gr    data            
#>   <chr> <list>          
#> 1 a     <tibble [3 × 2]>
#> 2 b     <tibble [3 × 1]>

另一种可能的解决方案,基于purrr::map_dfr:

library(tidyverse)
map_dfr(unique(str_remove(names(df), "_\d+")), 
~ tibble(gr = .x, nest(select(df, which(str_detect(names(df), .x))), 
data = everything())))
#> # A tibble: 2 × 2
#>   gr    data            
#>   <chr> <list>          
#> 1 a     <tibble [3 × 2]>
#> 2 b     <tibble [3 × 1]>

我的版本,稍微修改一下,整理一下stepan的答案

tibble(a_1 = 1:3, a_2 = 2:4, b_1 = 3:5) %>%
split.default(str_extract(names(.), "^[^_]+")) %>%
map(nest, data = everything()) %>%
bind_rows(.id = "gr")

找不到split.default()的替代品