聚合 R 中每组的常用字符值



>我有一个数据框如下:

> dput(data)
structure(list(Comments = c("This is good", "What is the price", "You are no good", "help the needy", "What a beautiful day", "how can I help you", "You are my best friend", "she is my friend", "which one is the best", "How can she do that"
), ID = c("A1", "B2", "A1", "C3", "D4", "C3", "E5", "E5", "E5", 
"E5")), class = "data.frame", row.names = c(NA, 10L))

基于唯一 ID,我想获取每个组中的所有常见字符值。

从建议中,我尝试了以下代码

check <-  aggregate(Comments ~ ID, demo, function(x){
temp = table(unlist(lapply(strsplit(x, ","), unique)))
temp = names(temp)[which(temp == max(temp) & temp > 1)]
if (length(temp) == 0) temp = ""
temp
})

这将提供唯一的 ID,但显示常用单词的空行

demo %>% 
mutate(Words = strsplit(Comments, " ")) %>% 
unnest %>% 
intersect(Comments) %>% 
group_by(ID, Comments) %>% 
summarise(Words = toString(Comments))

这给了我错误。

我的预期输出是:

ID  Comments
A1  "good"
B2  ""
C3  "help"
D4  ""
E5  "best, friend, she, is, my"

提前感谢!!

使用dplyr,我们可以创建一个带有row_number()的列来获取每个ID中的常用词。我们使用tidyr::separate_rows将单词分成不同的行,filter那些出现在超过 1 行中的Comments单词,group_byID并创建一个逗号分隔的字符串。

library(dplyr)
data %>%
mutate(row = row_number(), 
ID = factor(ID)) %>%
tidyr::separate_rows(Comments, sep = "\s+") %>%
group_by(ID, Comments) %>%
filter(n_distinct(row) > 1) %>%
group_by(ID, .drop = FALSE) %>% 
summarise(Comments = toString(unique(Comments)))

#  ID    Comments                 
#  <fct> <chr>                    
#1 A1    good                     
#2 B2    ""                       
#3 C3    help                     
#4 D4    ""                       
#5 E5    my, best, friend, she, is

有了dplyr,我们可以做

library(tidyverse)
data %>%
separate_rows(Comments) %>% 
count(Comments, ID) %>%
filter(n == max(n)) %>%
select(-n) %>%
complete(ID = unique(data$ID), fill = list(Comments = "")) %>% 
group_by(ID) %>% 
summarise(Comments = toString(Comments))
# A tibble: 5 x 2
#  ID    Comments                 
#  <chr> <chr>                    
#1 A1    good                     
#2 B2    ""                       
#3 C3    help                     
#4 D4    ""                       
#5 E5    best, friend, is, my, she

最新更新