r 比较两个数据帧中的字符串值



我有两个数据集如下所示:

df1 <- data.frame(Grade = c("G3","G3","G3","G3","G3","G3","G3","G3","G3","G3"),
names = c("Harper","Mason","Evelyn","Ella","Avery",
"Jackson","Olivia","Isla","Emily","Poppy"))
> df1
Grade   names
1     G3  Harper
2     G3   Mason
3     G3  Evelyn
4     G3    Ella
5     G3   Avery
6     G3 Jackson
7     G3  Olivia
8     G3    Isla
9     G3   Emily
10    G3   Poppy
df2 <- data.frame(Grade = c("G3","G3","G3","G3","G3","G3","G3"),
names = c("Harper","Mason","Ava","Avery","Isabella",
"Jessica","Emily"))
> df2
Grade    names
1    G3   Harper
2    G3    Mason
3    G3      Ava
4    G3    Avery
5    G3 Isabella
6    G3  Jessica
7    G3    Emily

在一个新的数据框中,我想保存四个信息:

(a( 通用名称,(b( DF1 中的唯一名称,



(c( DF2 中的唯一名称,以及 (d( 每列的计数。

因此,数据集应如下所示:

> final
Grade common.names unique.df1 unique.df2
1    G3       Harper     Evelyn        Ava
2    G3        Mason       Ella   Isabella
3    G3        Avery    Jackson    Jessica
4    G3        Emily     Olivia       <NA>
5    G3         <NA>       Isla       <NA>
6    G3         <NA>      Poppy       <NA>
7 Count            4          6          3

我试图从library(compare)compare(),但这似乎不适用于查找通用名称。

comparison <- compare(df1,df2,allowAll=TRUE)
comparison$tM
> comparison$tM
Grade   names
1    G3   AVERY
2    G3    ELLA
3    G3  EVELYN
4    G3  HARPER
5    G3 JACKSON
6    G3   MASON
7    G3  OLIVIA

对此有什么想法吗? 谢谢!

你可以写一个函数:

join <- function(x,y)
{
join_by = intersect(names(x),names(y))
a <- data.table::transpose(dplyr::inner_join(x,y,join_by))
b <- data.table::transpose(dplyr::anti_join(x,y,join_by))
d <- data.table::transpose(dplyr::anti_join(y,x,join_by))
counts <- setNames(lengths(e <- list(a,b,d)),
c("common.names", "unique.df1", "unique.df2"))
f <- do.call(plyr::rbind.fill,e[y<-order(counts,decreasing = TRUE)])
s <- data.table::transpose(f)[-c(3,5)]
setNames(s,c("V1",names(counts[y])))[c(1,y+1)]
}
join(df1,df2)
V1 common.names unique.df1 unique.df2
1 G3       Harper     Evelyn        Ava
2 G3        Mason       Ella   Isabella
3 G3        Avery    Jackson    Jessica
4 G3        Emily     Olivia       <NA>
5 G3         <NA>       Isla       <NA>
6 G3         <NA>      Poppy       <NA>

这是一个选项,我们按"Grade"拆分数据集(假设有多个"Grade"值(,用Map遍历list,获取两个数据集中常见的、独特的元素(intersectsetdiff- 相应的函数(,创建一个带有cbind.filldata.frame(从rowr开始(并rbindlist元素

library(rowr)
lst1 <- split(as.character(df1$names), df1$Grade)
lst2 <- split(as.character(df2$names), df2$Grade)
out <- do.call(rbind, unname(Map(function(x, y, z) {
cn <- intersect(x, y)
un1 <- setdiff(x, y)
un2 <- setdiff(y, x)         
cbind(Grade = z, cbind.fill(cn,  un1, un2, fill = NA))
}, lst1, lst2[names(lst1)], names(lst1))))
names(out)[-1] <- c("common.names", "unique.df1", "unique.df2")
out[] <- lapply(out, as.character)
rbind(out, c(Grade = 'Count', colSums(!is.na(out[-1]))))
#  Grade common.names unique.df1 unique.df2
#1    G3       Harper     Evelyn        Ava
#2    G3        Mason       Ella   Isabella
##    G3        Avery    Jackson    Jessica
#4    G3        Emily     Olivia       <NA>
#5    G3         <NA>       Isla       <NA>
#6    G3         <NA>      Poppy       <NA>
#7 Count            4          6          3

相关内容

  • 没有找到相关文章

最新更新