我有一个数据帧,看起来像这样:
message.id,sender,recipients
1,A,B|C
2,A,B
3,B,C|D|Q
我想在"|"上拆分recipients
列,然后收集结果以生成以下内容:
message.id,sender,recipient
1,A,B
1,A,C
2,A,B
3,B,C
3,B,D
3,B,Q
实现这种操作的更清晰的方法是什么?这是我当前的代码:
library(dplyr)
library(stringr)
library(tidyr)
df <- data.frame(message.id = c(1,2,3),
sender = c("A","A","B"),
recipients = c("B|C","B","C|D|Q"))
max.splits = df$recipients %>% str_count("\|") %>% max + 1
df %>% separate(recipients,1:max.splits, sep = "\|") %>%
gather(trash,recipient,-message.id,-sender) %>%
select(message.id, sender, recipient) %>%
filter(recipient %>% is.na == FALSE) %>%
arrange(message.id)
我有偏见,但我建议使用"splitstackshape"包中的cSplit
。
用法简单地说就是:
library(splitstackshape)
cSplit(df, "recipients", "|", "long")
# message.id sender recipients
# 1: 1 A B
# 2: 1 A C
# 3: 2 A B
# 4: 3 B C
# 5: 3 B D
# 6: 3 B Q
或者,将"dplyr"用于管道,将"tidyr"用于unnest
,然后您可以尝试:
library(dplyr)
library(tidyr)
df %>%
mutate(recipients = as.character(recipients)) %>% ## need character for strsplit
mutate(recipients = strsplit(recipients, "|", TRUE)) %>% ## Use `fixed = TRUE`
unnest(recipients) ## `unnest` goes to long form
# Source: local data frame [6 x 3]
#
# message.id sender recipients
# (dbl) (fctr) (chr)
# 1 1 A B
# 2 1 A C
# 3 2 A B
# 4 3 B C
# 5 3 B D
# 6 3 B Q
我们可以使用data.table
library(data.table)
setDT(df)[, list(recipient=unlist(strsplit(recipients, '[|]'))),
.(message.id, sender)]
使用plyr
怎么样?
library(plyr)
ddply(df, .(message.id), function(d){
cbind(
sender = as.character(d$sender),
recipients = strsplit(as.character(d$recipients), "\|")[[1]]
)
})
以下是使用dplyr
和tidyr
的解决方案
df <- data.frame(message.id = 1:3, sender = c("A","A","B"),
recipients = c("B|C","B","C|D|Q"))
原始数据
message.id sender recipients
1 1 A B|C
2 2 A B
3 3 B C|D|Q
代码
df %>% separate(recipients,into =c("r1","r2","r3")) %>%
gather("sen","recipient",r1:r3) %>% select(-sen) %>%
filter(!is.na(recipient))
结果
message.id sender recipient
1 1 A B
2 2 A B
3 3 B C
4 1 A C
5 3 B D
6 3 B Q