我想弄清楚如何在dplyr中使用join。当我加入和时和b使用full_join,我得到了4个状态,但缺少statfips的值。
- 是否有更好的方法来加入,并完全避免这个问题,而不丢失任何数据?
- 加入a后可以添加statetip吗?和b(真正的a和b包含4000+行)?
library(tidyverse)
# create df's
a <- tibble::tribble(
~statename, ~statefips, ~date, ~emp,
"Alabama", 1, "2020-01-14", 2,
"California", 6, "2020-01-14", 2,
"Alabama", 1, "2020-01-15", 2,
"California", 6, "2020-01-15", 2,
"Alabama", 1, "2020-01-16", 3,
"California", 6, "2020-01-16", 3,
"Alabama", 1, "2020-01-17", 3,
"California", 6, "2020-01-17", 3,
"Alabama", 1, "2020-01-18", 4,
"California", 6, "2020-01-18", 4,
"Alabama", 1, "2020-01-19", 4,
"California", 6, "2020-01-19", 4,
"Alabama", 1, "2020-01-20", 4,
"California", 6, "2020-01-20", 5,
"Alabama", 1, "2020-01-21", 5,
"California", 6, "2020-01-21", 5,
"Alabama", 1, "2020-01-22", 5,
"California", 6, "2020-01-22", 5,
"Alabama", 1, "2020-01-21", 5,
"California", 6, "2020-01-21", 4,
"Alabama", 1, "2020-01-22", 4,
"California", 6, "2020-01-22", 4,
"Alabama", 1, "2020-01-23", 4,
"California", 6, "2020-01-23", 4,
"Alabama", 1, "2020-01-24", 4,
"California", 6, "2020-01-24", 4
)
b <- tibble::tribble(
~statename, ~date, ~ui_claims,
"Alabama", "2020-01-04", "0.5",
"California", "2020-01-04", "0.5",
"Alabama", "2020-01-11", "0.5",
"California", "2020-01-11", "2.5",
"Alabama", "2020-01-18", "2.5",
"California", "2020-01-18", "1.5"
)
# Join a and b
full_join <- full_join(a, b, by = c("statename", "date")) %>% arrange(date)
# my try to fix missing NA's (doesn't work)
state_id <- tibble::tribble(
~statename, ~statefips,
"Alabama", 1,
"California", 6
)
full_join_fix <- full_join(full_join, state_id, by = "statename") %>% arrange(date)
我不太确定,如果这是你正在寻找的,但在full_join
之后,我们可以arrange
,然后fill
:
library(dplyr)
library(tidyr)
a %>%
full_join(b, by = c("statename", "date")) %>%
arrange(statename) %>%
fill(statefips, .direction = "down") %>%
print(n=40)
statename statefips date emp ui_claims
<chr> <dbl> <chr> <dbl> <chr>
1 Alabama 1 2020-01-14 2 NA
2 Alabama 1 2020-01-15 2 NA
3 Alabama 1 2020-01-16 3 NA
4 Alabama 1 2020-01-17 3 NA
5 Alabama 1 2020-01-18 4 2.5
6 Alabama 1 2020-01-19 4 NA
7 Alabama 1 2020-01-20 4 NA
8 Alabama 1 2020-01-21 5 NA
9 Alabama 1 2020-01-22 5 NA
10 Alabama 1 2020-01-21 5 NA
11 Alabama 1 2020-01-22 4 NA
12 Alabama 1 2020-01-23 4 NA
13 Alabama 1 2020-01-24 4 NA
14 Alabama 1 2020-01-04 NA 0.5
15 Alabama 1 2020-01-11 NA 0.5
16 California 6 2020-01-14 2 NA
17 California 6 2020-01-15 2 NA
18 California 6 2020-01-16 3 NA
19 California 6 2020-01-17 3 NA
20 California 6 2020-01-18 4 1.5
21 California 6 2020-01-19 4 NA
22 California 6 2020-01-20 5 NA
23 California 6 2020-01-21 5 NA
24 California 6 2020-01-22 5 NA
25 California 6 2020-01-21 4 NA
26 California 6 2020-01-22 4 NA
27 California 6 2020-01-23 4 NA
28 California 6 2020-01-24 4 NA
29 California 6 2020-01-04 NA 0.5
30 California 6 2020-01-11 NA 2.5