r语言 - 连接两个不完整的数据.具有相同列名的表



我有两个不完整的数据。具有相同列名的表

dt1 <- data.table(id = c(1, 2, 3), v1 = c("w", "x", NA), v2 = c("a", NA, "c"))
dt2 <- data.table(id = c(2, 3, 4), v1 = c(NA, "y", "z"), v2 = c("b", "c", NA))

它们是这样的:

dt1
id   v1   v2
1:  1    w    a
2:  2    x <NA>
3:  3 <NA>    c
> dt2
id   v1   v2
1:  2 <NA>    b
2:  3    y    c
3:  4    z <NA>

是否有办法通过填写缺失的信息来合并两者?

这是我想要的结果:

id v1   v2
1:  1  w    a
2:  2  x    b
3:  3  y    c
4:  4  z <NA>

我已经尝试了各种数据。表连接,合并,但我要么得到重复列:

> merge(dt1,
+       dt2,
+       by = "id",
+       all = TRUE)
id v1.x v2.x v1.y v2.y
1:  1    w    a <NA> <NA>
2:  2    x <NA> <NA>    b
3:  3 <NA>    c    y    c
4:  4 <NA> <NA>    z <NA>

或重复行:

> merge(dt1,
+       dt2,
+       by = names(dt1),
+       all = TRUE)
id   v1   v2
1:  1    w    a
2:  2 <NA>    b
3:  2    x <NA>
4:  3 <NA>    c
5:  3    y    c
6:  4    z <NA>

两个数据。表具有相同的列名。

可以按ID分组,省略NAs后得到唯一的值,即

library(data.table)
merge(dt1, dt2, all = TRUE)[, 
lapply(.SD, function(i)na.omit(unique(i))), 
by = id][]
#   id v1   v2
#1:  1  w    a
#2:  2  x    b
#3:  3  y    c
#4:  4  z <NA>

您也可以从rbind():

开始
rbind(dt1, dt2)[, lapply(.SD, (x) unique(x[!is.na(x)])), by = id]
#       id     v1     v2
#    <num> <char> <char>
# 1:     1      w      a
# 2:     2      x      b
# 3:     3      y      c
# 4:     4      z   <NA>

首先是full_join,然后是group_by,每个id并合并行:

library(dplyr)
library(tidyr)
dt1 %>%
full_join(dt2, by = c("id", "v1", "v2")) %>%
group_by(id) %>%
fill(starts_with('v'), .direction = 'updown') %>%
slice(1) %>%
ungroup

输出:

# A tibble: 4 × 3
id v1    v2   
<dbl> <chr> <chr>
1     1 w     a    
2     2 x     b    
3     3 y     c    
4     4 z     NA 

最新更新