r语言 - 如何使用 dplyr 连接更改/避免列中的 NA?



我想弄清楚如何在dplyr中使用join。当我加入时和b使用full_join,我得到了4个状态,但缺少statfips的值。

  1. 是否有更好的方法来加入,并完全避免这个问题,而不丢失任何数据?
  2. 加入a后可以添加statetip吗?和b(真正的a和b包含4000+行)?
library(tidyverse)

# create df's
a <- tibble::tribble(
~statename, ~statefips,        ~date, ~emp,
"Alabama",          1, "2020-01-14",    2,
"California",          6, "2020-01-14",    2,
"Alabama",          1, "2020-01-15",    2,
"California",          6, "2020-01-15",    2,
"Alabama",          1, "2020-01-16",    3,
"California",          6, "2020-01-16",    3,
"Alabama",          1, "2020-01-17",    3,
"California",          6, "2020-01-17",    3,
"Alabama",          1, "2020-01-18",    4,
"California",          6, "2020-01-18",    4,
"Alabama",          1, "2020-01-19",    4,
"California",          6, "2020-01-19",    4,
"Alabama",          1, "2020-01-20",    4,
"California",          6, "2020-01-20",    5,
"Alabama",          1, "2020-01-21",    5,
"California",          6, "2020-01-21",    5,
"Alabama",          1, "2020-01-22",    5,
"California",          6, "2020-01-22",    5,
"Alabama",          1, "2020-01-21",    5,
"California",          6, "2020-01-21",    4,
"Alabama",          1, "2020-01-22",    4,
"California",          6, "2020-01-22",    4,
"Alabama",          1, "2020-01-23",    4,
"California",          6, "2020-01-23",    4,
"Alabama",          1, "2020-01-24",    4,
"California",          6, "2020-01-24",    4
)
b <- tibble::tribble(
~statename,        ~date, ~ui_claims,
"Alabama", "2020-01-04",      "0.5",
"California", "2020-01-04",      "0.5",
"Alabama", "2020-01-11",      "0.5",
"California", "2020-01-11",      "2.5",
"Alabama", "2020-01-18",      "2.5",
"California", "2020-01-18",      "1.5"
)
# Join a and b
full_join <- full_join(a, b, by = c("statename", "date")) %>% arrange(date)
# my try to fix missing NA's (doesn't work)
state_id <- tibble::tribble(
~statename, ~statefips,
"Alabama",          1,
"California",          6
)
full_join_fix <- full_join(full_join, state_id, by = "statename") %>% arrange(date)

我不太确定,如果这是你正在寻找的,但在full_join之后,我们可以arrange,然后fill:

library(dplyr)
library(tidyr)
a %>% 
full_join(b, by = c("statename", "date")) %>% 
arrange(statename) %>% 
fill(statefips, .direction = "down") %>% 
print(n=40)
statename  statefips date         emp ui_claims
<chr>          <dbl> <chr>      <dbl> <chr>    
1 Alabama            1 2020-01-14     2 NA       
2 Alabama            1 2020-01-15     2 NA       
3 Alabama            1 2020-01-16     3 NA       
4 Alabama            1 2020-01-17     3 NA       
5 Alabama            1 2020-01-18     4 2.5      
6 Alabama            1 2020-01-19     4 NA       
7 Alabama            1 2020-01-20     4 NA       
8 Alabama            1 2020-01-21     5 NA       
9 Alabama            1 2020-01-22     5 NA       
10 Alabama            1 2020-01-21     5 NA       
11 Alabama            1 2020-01-22     4 NA       
12 Alabama            1 2020-01-23     4 NA       
13 Alabama            1 2020-01-24     4 NA       
14 Alabama            1 2020-01-04    NA 0.5      
15 Alabama            1 2020-01-11    NA 0.5      
16 California         6 2020-01-14     2 NA       
17 California         6 2020-01-15     2 NA       
18 California         6 2020-01-16     3 NA       
19 California         6 2020-01-17     3 NA       
20 California         6 2020-01-18     4 1.5      
21 California         6 2020-01-19     4 NA       
22 California         6 2020-01-20     5 NA       
23 California         6 2020-01-21     5 NA       
24 California         6 2020-01-22     5 NA       
25 California         6 2020-01-21     4 NA       
26 California         6 2020-01-22     4 NA       
27 California         6 2020-01-23     4 NA       
28 California         6 2020-01-24     4 NA       
29 California         6 2020-01-04    NA 0.5      
30 California         6 2020-01-11    NA 2.5 

最新更新