我有一个包含一千行和两列的目录,其中第一列包含一个分类标签,第二列包含一个字符串。
tab <- tibble(category = c("CAT1", "CAT1", "CAT1", "CAT1", "CAT1", "CAT2", "CAT2", "CAT2", "CAT2"),
word = c("Lorem", "ipsum", "dolor", "sit", "amet", "Consectetur", "adipiscing", "elit", "nam"))
tab
# A tibble: 9 x 2
category word
<chr> <chr>
1 CAT1 Lorem
2 CAT1 ipsum
3 CAT1 dolor
4 CAT1 sit
5 CAT1 amet
6 CAT2 Consectetur
7 CAT2 adipiscing
8 CAT2 elit
9 CAT2 nam
现在,我想要做的是折叠这些行,以便每个category
只有一行,并且该类别的所有words
一起放在单个单元格中,用分号分隔。像这样:
# A tibble: 2 x 2
category word
<chr> <chr>
1 CAT1 Lorem; ipsum; dolor; sit; amet
2 CAT2 Consectetur; adipiscing; elit; nam
有谁知道我该如何解决这个问题,并愿意飞我的救援?
我们可以使用
library(dplyr)
library(stringr)
tab %>%
group_by(category) %>%
summarise(word = str_c(word, collapse ="; "))
与产出
# A tibble: 2 x 2
category word
<chr> <chr>
1 CAT1 Lorem; ipsum; dolor; sit; amet
2 CAT2 Consectetur; adipiscing; elit; nam
我们也可以pivot_wider
,然后unite
所需的列:
library(dplyr)
library(tidyr)
tab %>% pivot_wider(names_from = word, values_from = word) %>%
unite(col='word', -category, sep=', ', na.rm = TRUE)
# A tibble: 2 x 2
category word
<chr> <chr>
1 CAT1 Lorem, ipsum, dolor, sit, amet
2 CAT2 Consectetur, adipiscing, elit, nam