我是一个相当缺乏经验的R用户,面临以下问题:
我想合并两个数据表dt1和dt2。dt1包含一个名为Assessment的变量。dt2包含两个名为ID和Frequency的变量。现在,我还想在dt2中获得评估观察结果。
为简单起见,考虑以下示例:
library(dplyr)
library(data.table)
dt1 <- data.table(c("perfect", "perfect", "okay", "unsufficient", "good", "good", "okay", "perfect"))
colnames(dt1) <- "Assessment"
dt2 <- data.table(cbind(c(1,2,3,4,5,6),c(1,3,1,1,1,1)))
colnames(dt2) <- c("ID", "Frequency")
因此,dt1看起来像:
评估 | 完美 | 完美
---|
好 |
unsufficient |
好 |
dt1 %>%
bind_cols(
dt2 %>%
uncount(Frequency)
) %>%
group_by(ID) %>%
summarise(Assessment = paste0(Assessment,collapse = ";"))
# A tibble: 6 x 2
ID Assessment
<dbl> <chr>
1 1 perfect
2 2 perfect;okay;unsufficient
3 3 good
4 4 good
5 5 okay
6 6 perfect
如果你相信正确的顺序,就像你在OP中说的,你可以根据它们的频率rep.int
id。
dt2[dt1[, list(Assessment=toString(Assessment)), by=list(ID=with(dt2, rep.int(ID, Frequency)))], on=.(ID)]
# ID Frequency Assessment
# 1: 1 1 perfect
# 2: 2 3 perfect, okay, unsufficient
# 3: 3 1 good
# 4: 4 1 good
# 5: 5 1 okay
# 6: 6 1 perfect
或
dt2[dt1[, list(Assessment=list(Assessment)), by=list(ID=with(dt2, rep.int(ID, Frequency)))], on=.(ID)]
# ID Frequency Assessment
# 1: 1 1 perfect
# 2: 2 3 perfect,okay,unsufficient
# 3: 3 1 good
# 4: 4 1 good
# 5: 5 1 okay
# 6: 6 1 perfect
不同的是,在第二个版本中,Assessment
是一个列表列。
注意:如果dt2
不包含任何其他内容,则无需再合并,并简化为
dt1[, list(Assessment=toString(Assessment)), by=list(ID=with(dt2, rep.int(ID, Frequency)))]
# ID Assessment
# 1: 1 perfect
# 2: 2 perfect, okay, unsufficient
# 3: 3 good
# 4: 4 good
# 5: 5 okay
# 6: 6 perfect