我有一个数据,每个参与者都有多条记录(记录的数量各不相同(。我试图通过将每个参与者的每个列的记录组合起来,将这些记录组合为每个参与者的一个记录。
所以,如果我有这样的数据:
dummy<-tribble(
~id, ~A, ~B, ~C, ~D,
1, "one", "two", "three", "four",
1, "one", "two", "three", "five",
1, "one", "six", "three", "four",
1, "one", "seven", "three", "five",
2, "one", "two", "three", "four",
2, "one", "two", "six", "five",
3, "one", "two", "three", "four",
3, "one", "seven", "six", "five",
3, "one", "two", "six", "eight"
)
我正在寻找这样的输出:
1, "one+one+one+one", "two+two+six+seven", "three+three+three+three", "four+five+four+five",
2, "one+one", "two+two", "three+six", "four+five",
3, "one+one+one", "two+seven+two", "three+six+six", "four+five+eight",
我更喜欢使用tidyverse
,我觉得group_by
和unite
会出现在这里,但我不知道如何循环遍历每个参与者的不同数量的记录,并将其应用于所有列(实际数据中有28个(。
理想情况下,我还想丢弃重复的数据,这样我就可以得到:
1, "one", "two+two+six+seven", "three+three+three+three", "four+five+four+five",
2, "one", "two", "three+six", "four+five",
3, "one", "two+seven+two", "three+six+six", "four+five+eight",
关于如何做到这一点,有什么建议吗?
使用str_c
library(dplyr)
library(stringr)
dummy %>%
group_by(id) %>%
summarise(across(A:D, ~str_c(unique(.), collapse = "+")))
-输出
# A tibble: 3 x 5
id A B C D
<dbl> <chr> <chr> <chr> <chr>
1 1 one two+six+seven three four+five
2 2 one two three+six four+five
3 3 one two+seven three+six four+five+eight
group_by()
和summarise()
即可。unique()
删除重复数据。
dummy %>%
group_by(id) %>%
summarise(across(A:D, ~ paste(unique(.), collapse = "+")))
# # A tibble: 3 x 5
# id A B C D
# <dbl> <chr> <chr> <chr> <chr>
# 1 1 one two+six+seven three four+five
# 2 2 one two three+six four+five
# 3 3 one two+seven three+six four+five+eight
对于第一次输出,您也可以进行
library(tidyverse)
dummy<-tribble(
~id, ~A, ~B, ~C, ~D,
1, "one", "two", "three", "four",
1, "one", "two", "three", "five",
1, "one", "six", "three", "four",
1, "one", "seven", "three", "five",
2, "one", "two", "three", "four",
2, "one", "two", "six", "five",
3, "one", "two", "three", "four",
3, "one", "seven", "six", "five",
3, "one", "two", "six", "eight"
)
dummy %>% group_by(id) %>%
summarise(across(everything(), ~paste(., collapse = '+')))
#> # A tibble: 3 x 5
#> id A B C D
#> <dbl> <chr> <chr> <chr> <chr>
#> 1 1 one+one+one+one two+two+six+sev~ three+three+three+th~ four+five+four+f~
#> 2 2 one+one two+two three+six four+five
#> 3 3 one+one+one two+seven+two three+six+six four+five+eight
创建于2021-06-28由reprex包(v2.0.0(