r-根据条件连接一行中的文本



我有如下数据。

a <- structure(list(Title = c("AAADE", "BBBCF", "NBNJHB", "TTTTT", "VVVFF", 
"AASFE", "DDDFFF", "ERFRR", "AAAAAA", "ERERE"), 
Year = c("2004", "2004", "2004", "2004", "2004", "2004", "2005", "2005", "2005", "2005")),
.Names = c("Title", "Year"), row.names = c(NA, -10L), class = "data.frame")
a
    Title Year
1   AAADE 2004
2   BBBCF 2004
3  NBNJHB 2004
4   TTTTT 2004
5   VVVFF 2004
6   AASFE 2004
7  DDDFFF 2005
8   ERFRR 2005
9  AAAAAA 2005
10  ERERE 2005

我想将基于同一年份的行连接起来。我正在尝试使用"tm"包函数,但这些函数并不能帮助我获得以下内容。

Title                                     Year      
AAADE BBBCF NBNJHB TTTTT VVVFF AASFE      2004
DDDFFF ERFRR AAAAAA ERERE                 2005

更直接的方法是使用aggregate:

aggregate(Title ~ Year, a, paste, collapse = " ")
#   Year                                Title
# 1 2004 AAADE BBBCF NBNJHB TTTTT VVVFF AASFE
# 2 2005            DDDFFF ERFRR AAAAAA ERERE

如果列的顺序对您很重要,您可以执行aggregate(Title ~ Year, a, paste, collapse = " ")[names(a)]

aggregate开始,您可以查看"data.table"one_answers"dplyr",这两种方法对于更大的数据集都更有效。

这是"dplyr":

library(dplyr)
a %>% group_by(Year) %>% summarise(Title = paste(Title, collapse = " "))
# Source: local data frame [2 x 2]
# 
#   Year                                Title
# 1 2004 AAADE BBBCF NBNJHB TTTTT VVVFF AASFE
# 2 2005            DDDFFF ERFRR AAAAAA ERERE

这是"数据表":

library(data.table)
A <- as.data.table(a)
A[, list(Title = paste(Title, collapse = " ")), by = Year]
#    Year                                Title
# 1: 2004 AAADE BBBCF NBNJHB TTTTT VVVFF AASFE
# 2: 2005            DDDFFF ERFRR AAAAAA ERERE
with(a, data.frame(Title = tapply(Title, Year, paste, collapse = ' '), Year = unique(Year)))

结果:

                                Title Year
 AAADE BBBCF NBNJHB TTTTT VVVFF AASFE 2004
            DDDFFF ERFRR AAAAAA ERERE 2005