R:将多个记录合并到多列中

我有一个数据，每个参与者都有多条记录(记录的数量各不相同(。我试图通过将每个参与者的每个列的记录组合起来，将这些记录组合为每个参与者的一个记录。

所以，如果我有这样的数据：

dummy<-tribble(
~id, ~A, ~B, ~C, ~D,
1, "one", "two", "three", "four",
1, "one", "two", "three", "five",
1, "one", "six", "three", "four",
1, "one", "seven", "three", "five",
2, "one", "two", "three", "four",
2, "one", "two", "six", "five",
3, "one", "two", "three", "four",
3, "one", "seven", "six", "five",
3, "one", "two", "six", "eight"
)

我正在寻找这样的输出：

1, "one+one+one+one", "two+two+six+seven", "three+three+three+three", "four+five+four+five",
2, "one+one", "two+two", "three+six", "four+five",
3, "one+one+one", "two+seven+two", "three+six+six", "four+five+eight",

我更喜欢使用tidyverse，我觉得group_by和unite会出现在这里，但我不知道如何循环遍历每个参与者的不同数量的记录，并将其应用于所有列(实际数据中有28个(。

理想情况下，我还想丢弃重复的数据，这样我就可以得到：

1, "one", "two+two+six+seven", "three+three+three+three", "four+five+four+five",
2, "one", "two", "three+six", "four+five",
3, "one", "two+seven+two", "three+six+six", "four+five+eight",

关于如何做到这一点，有什么建议吗？

使用str_c

library(dplyr)
library(stringr)
dummy %>%
group_by(id) %>%
summarise(across(A:D,  ~str_c(unique(.), collapse = "+")))

-输出

# A tibble: 3 x 5
id A     B             C         D              
<dbl> <chr> <chr>         <chr>     <chr>          
1     1 one   two+six+seven three     four+five      
2     2 one   two           three+six four+five      
3     3 one   two+seven     three+six four+five+eight

group_by()和summarise()即可。unique()删除重复数据。

dummy %>% 
group_by(id) %>% 
summarise(across(A:D, ~ paste(unique(.), collapse = "+")))
# # A tibble: 3 x 5
#      id A     B             C         D
#   <dbl> <chr> <chr>         <chr>     <chr>
# 1     1 one   two+six+seven three     four+five      
# 2     2 one   two           three+six four+five      
# 3     3 one   two+seven     three+six four+five+eight

对于第一次输出，您也可以进行

library(tidyverse)
dummy<-tribble(
~id, ~A, ~B, ~C, ~D,
1, "one", "two", "three", "four",
1, "one", "two", "three", "five",
1, "one", "six", "three", "four",
1, "one", "seven", "three", "five",
2, "one", "two", "three", "four",
2, "one", "two", "six", "five",
3, "one", "two", "three", "four",
3, "one", "seven", "six", "five",
3, "one", "two", "six", "eight"
)
dummy %>% group_by(id) %>%
summarise(across(everything(), ~paste(., collapse = '+')))
#> # A tibble: 3 x 5
#>      id A               B                C                     D                
#>   <dbl> <chr>           <chr>            <chr>                 <chr>            
#> 1     1 one+one+one+one two+two+six+sev~ three+three+three+th~ four+five+four+f~
#> 2     2 one+one         two+two          three+six             four+five        
#> 3     3 one+one+one     two+seven+two    three+six+six         four+five+eight

^{创建于2021-06-28由reprex包(v2.0.0(}

相关内容

最新更新

热门标签：