r-将一列拆分为多列,然后收集结果的更好方法



我有一个数据帧,看起来像这样:

message.id,sender,recipients
1,A,B|C
2,A,B
3,B,C|D|Q

我想在"|"上拆分recipients列,然后收集结果以生成以下内容:

message.id,sender,recipient
1,A,B
1,A,C
2,A,B
3,B,C
3,B,D
3,B,Q

实现这种操作的更清晰的方法是什么?这是我当前的代码:

library(dplyr)
library(stringr)
library(tidyr)
df <- data.frame(message.id = c(1,2,3),
                 sender = c("A","A","B"),
                 recipients = c("B|C","B","C|D|Q"))
max.splits = df$recipients %>% str_count("\|") %>% max + 1
df %>% separate(recipients,1:max.splits, sep = "\|") %>%
  gather(trash,recipient,-message.id,-sender) %>%
  select(message.id, sender, recipient) %>%
  filter(recipient %>% is.na == FALSE) %>%
  arrange(message.id)

我有偏见,但我建议使用"splitstackshape"包中的cSplit

用法简单地说就是:

library(splitstackshape)
cSplit(df, "recipients", "|", "long")
#    message.id sender recipients
# 1:          1      A          B
# 2:          1      A          C
# 3:          2      A          B
# 4:          3      B          C
# 5:          3      B          D
# 6:          3      B          Q

或者,将"dplyr"用于管道,将"tidyr"用于unnest,然后您可以尝试:

library(dplyr)
library(tidyr)
df %>%
  mutate(recipients = as.character(recipients)) %>%         ## need character for strsplit
  mutate(recipients = strsplit(recipients, "|", TRUE)) %>%  ## Use `fixed = TRUE`
  unnest(recipients)                                        ## `unnest` goes to long form
# Source: local data frame [6 x 3]
# 
#   message.id sender recipients
#        (dbl) (fctr)      (chr)
# 1          1      A          B
# 2          1      A          C
# 3          2      A          B
# 4          3      B          C
# 5          3      B          D
# 6          3      B          Q

我们可以使用data.table

library(data.table)
setDT(df)[, list(recipient=unlist(strsplit(recipients, '[|]'))),
              .(message.id, sender)]

使用plyr怎么样?

library(plyr)
ddply(df, .(message.id), function(d){
    cbind(
        sender = as.character(d$sender), 
        recipients = strsplit(as.character(d$recipients), "\|")[[1]]
    )
})

以下是使用dplyrtidyr 的解决方案

df <- data.frame(message.id = 1:3, sender = c("A","A","B"),
recipients = c("B|C","B","C|D|Q"))

原始数据

  message.id sender recipients
1          1      A        B|C
2          2      A          B
3          3      B      C|D|Q

代码

df %>% separate(recipients,into =c("r1","r2","r3")) %>% 
gather("sen","recipient",r1:r3) %>% select(-sen) %>% 
filter(!is.na(recipient))

结果

  message.id sender recipient
1          1      A         B
2          2      A         B
3          3      B         C
4          1      A         C
5          3      B         D
6          3      B         Q

相关内容

  • 没有找到相关文章

最新更新