我有两个要合并的数据帧。根据id
&CCD_ 2。
其中之一是";"main";,另一个添加信息。然而,我不能以常规方式(即merge()
或dplyr::left_join()
(合并它们,因为它们中的year
值不一定与每个id
匹配。因此,我想按时间顺序将第二个表中的已知内容转移到主表中的每year
行中。
在下面的例子中,我有两张关于军官的表格。";主";一个有3列用于id
、year
和另一个col_1
:
df_main_info <-
tribble(~id, ~year, ~col_1,
1, 2008, "foo",
1, 2005, "bar",
1, 2010, "blah",
1, 2020, "bar",
2, 1999, "foo",
2, 2020, "foo",
3, 2002, "bar",
3, 2010, "bar",
4, 2003, "foo",
4, 2010, "bar"
)
我有一个额外的表格,里面有id
和year
列,每个军官什么时候获得军衔,是什么军衔:
df_ranks_history <-
tribble(~id, ~year, ~army_rank,
1, 2005, "second_lieutenant",
1, 2010, "first_lieutenant",
1, 2018, "major",
1, 2021, "colonel",
2, 2002, "major",
2, 2018, "colonel",
3, 1995, "second_lieutenant",
3, 2000, "captain",
3, 2012, "colonel"
)
年份不完全匹配。但是,例如,如果id = 3
警官在2000年成为"captain"
,那么我们知道在2002年仍然是这样,所以我们可以输入";船长;第7行的CCD_ 15中。
因此,所需的输出应该是:
desired_output <-
tribble(~id, ~year, ~col_1, ~army_rank,
1, 2008, "foo", "second_lieutenant",
1, 2005, "bar", "second_lieutenant",
1, 2010, "blah", "first_lieutenant",
1, 2020, "bar", "major",
2, 1999, "foo", NA,
2, 2020, "foo", "colonel",
3, 2002, "bar", "captain",
3, 2010, "bar", "captain",
4, 2003, "foo", NA,
4, 2010, "bar", NA
)
如果这是相关的,则排名按特定顺序排列:
us_army_officer_ranks <- c("second_lieutenant",
"first_lieutenant",
"captain",
"major",
"lieutenant_colonel",
"colonel")
# colonel > lieutenant_colonel > major > captain > first_lieutenant > second_lieutenant
library(dplyr)
library(tidyr)
df_main_info %>%
full_join(df_ranks_history, by = c("id", "year")) %>%
group_by(id) %>%
arrange(id, year) %>%
fill(army_rank, .direction = "down") %>%
filter(!is.na(col_1))
# # A tibble: 10 × 4
# # Groups: id [4]
# id year col_1 army_rank
# <dbl> <dbl> <chr> <chr>
# 1 1 2005 bar second_lieutenant
# 2 1 2008 foo second_lieutenant
# 3 1 2010 blah first_lieutenant
# 4 1 2020 bar major
# 5 2 1999 foo NA
# 6 2 2020 foo colonel
# 7 3 2002 bar captain
# 8 3 2010 bar captain
# 9 4 2003 foo NA
# 10 4 2010 bar NA
library(data.table)
setDT(df_main_info)
setDT(df_ranks_history)
df_ranks_history[df_main_info, on = list(id, year), roll = +Inf]
id year army_rank col_1
1: 1 2008 second_lieutenant foo
2: 1 2005 second_lieutenant bar
3: 1 2010 first_lieutenant blah
4: 1 2020 major bar
5: 2 1999 <NA> foo
6: 2 2020 colonel foo
7: 3 2002 captain bar
8: 3 2010 captain bar
9: 4 2003 <NA> foo
10: 4 2010 <NA> bar