我有如下数据。
a <- structure(list(Title = c("AAADE", "BBBCF", "NBNJHB", "TTTTT", "VVVFF",
"AASFE", "DDDFFF", "ERFRR", "AAAAAA", "ERERE"),
Year = c("2004", "2004", "2004", "2004", "2004", "2004", "2005", "2005", "2005", "2005")),
.Names = c("Title", "Year"), row.names = c(NA, -10L), class = "data.frame")
a
Title Year
1 AAADE 2004
2 BBBCF 2004
3 NBNJHB 2004
4 TTTTT 2004
5 VVVFF 2004
6 AASFE 2004
7 DDDFFF 2005
8 ERFRR 2005
9 AAAAAA 2005
10 ERERE 2005
我想将基于同一年份的行连接起来。我正在尝试使用"tm"包函数,但这些函数并不能帮助我获得以下内容。
Title Year
AAADE BBBCF NBNJHB TTTTT VVVFF AASFE 2004
DDDFFF ERFRR AAAAAA ERERE 2005
更直接的方法是使用aggregate
:
aggregate(Title ~ Year, a, paste, collapse = " ")
# Year Title
# 1 2004 AAADE BBBCF NBNJHB TTTTT VVVFF AASFE
# 2 2005 DDDFFF ERFRR AAAAAA ERERE
如果列的顺序对您很重要,您可以执行aggregate(Title ~ Year, a, paste, collapse = " ")[names(a)]
。
从aggregate
开始,您可以查看"data.table"one_answers"dplyr",这两种方法对于更大的数据集都更有效。
这是"dplyr":
library(dplyr)
a %>% group_by(Year) %>% summarise(Title = paste(Title, collapse = " "))
# Source: local data frame [2 x 2]
#
# Year Title
# 1 2004 AAADE BBBCF NBNJHB TTTTT VVVFF AASFE
# 2 2005 DDDFFF ERFRR AAAAAA ERERE
这是"数据表":
library(data.table)
A <- as.data.table(a)
A[, list(Title = paste(Title, collapse = " ")), by = Year]
# Year Title
# 1: 2004 AAADE BBBCF NBNJHB TTTTT VVVFF AASFE
# 2: 2005 DDDFFF ERFRR AAAAAA ERERE
with(a, data.frame(Title = tapply(Title, Year, paste, collapse = ' '), Year = unique(Year)))
结果:
Title Year
AAADE BBBCF NBNJHB TTTTT VVVFF AASFE 2004
DDDFFF ERFRR AAAAAA ERERE 2005