在R编程中,用逗号分隔符将列组合成一行,并将它们分组以求和其他列



我想以这样一种方式转换数据,即列变成单行

col1    Col2                    Col3 Col4       Col5  
1 344230. masalas & spices        4    14           2
2 344231. hair care               4    14           1
3 344231. otc                     4    14           1
4 344231. personal hygiene        4    14           1
5 344232. detergents              4    14           2
6 344233. biscuits                4    14           2
7 344233. chocolates & sweets     4    14           1
8 344233. dry fruits              4    14           2    

输出将类似

col1   Col2                                    Col5
344230 masalas & spices                        2
344231 hair care,otc,personal hygine           1+1+1=3
344232 detergent                               2
344233 biscuits,choclates&sweets,dry fruits    2+1+ 2=5
library(dplyr)
df %>% 
group_by(col1) %>% 
mutate(Col5 = sum(Col5),
Col2=paste(Col2,collapse=',')) %>% 
slice(1)
col1                                 Col2  Col3  Col4  Col5
<dbl>                                <chr> <int> <int> <int>
1 344230                       masalas&spices     4    14     2
2 344231         haircare,otc,personalhygiene     4    14     3
3 344232                           detergents     4    14     2
4 344233 biscuits,chocolates&sweets,dryfruits     4    14     5

如果不需要Col3Col4,可以用summarise替换mutate,并跳过slice(1)

数据:

df <-  read.table(text = "
col1    Col2                    Col3 Col4       Col5  
1 344230. masalas&spices        4    14           2
2 344231. haircare               4    14           1
3 344231. otc                     4    14           1
4 344231. personalhygiene        4    14           1
5 344232. detergents              4    14           2
6 344233. biscuits                4    14           2
7 344233. chocolates&sweets     4    14           1
8 344233. dryfruits              4    14           2    ", h = T)

最新更新