基于R中列中的多个值组合/合并数据帧



我已经能够像这样合并数据帧:

df1 <- read.table(text="
col1    col2    colx 
A        5    hh
B        3    jj
C        6    kk
E        7    mm", header=TRUE, stringsAsFactors=FALSE)
df2 <- read.table(text="
col3    col4    coly
A       5    be
B       3    to
C       6    go
E       7   yo
", header=T, stringsAsFactors=FALSE)
full_join(df1, df2, by = c('col1'='col3',"col2" = "col4"))

这给了我这个:

col1 col2 colx coly
1    A    5   hh   be
2    B    3   jj   to
3    C    6   kk   go
4    E    7   mm   yo

但现在我需要将df1和df3合并,类似于%中的"A|B"中的"A"%

df3 <- read.table(text="
col3        col4    coly
'A | B'       5      be
'B | C'       3      to
C            6      go
E            7      yo
", header=T, stringsAsFactors=FALSE)

这可能吗?

我们可以在删除|前后的空格后使用regex_full_join

library(dplyr)
library(fuzzyjoin)
library(stringr)
df3 %>%
mutate(col3 = str_remove_all(col3, "\s+")) %>%
regex_full_join(df1, ., by = c('col1' = 'col3', 'col2' = 'col4'))

-输出

#  col1 col2 colx col3 col4 coly
#1    A    5   hh  A|B    5   be
#2    B    3   jj  B|C    3   to
#3    C    6   kk    C    6   go
#4    E    7   mm    E    7   yo

最新更新