我有一个具有列类型的单列数据帧,其中包含所有可能的"类型":
comment type
used enter
used open
used close
used update
not_used delete
我从数据库中获取数据帧。但在该数据帧中;"类型";可能是。下面是该表的示例:
ID date type value
a1 2020-09-01 enter 18
a1 2020-09-01 close 15
a1 2020-09-02 enter 4
a2 2020-09-01 close 10
b1 2020-09-02 update 10
正如您所看到的,ID a1只有两种类型:输入和关闭。a2只有关闭,b1只有更新。
我想用那种方式把这两张表绑定起来,所以"类型";不在我的表中的每个ID和日期的值为零。那么,如何绑定这两个表来获得这个:
comment ID date type value
used a1 2020-09-01 enter 18
used a1 2020-09-01 open 0
used a1 2020-09-01 close 15
used a1 2020-09-01 update 0
not_used a1 2020-09-01 delete 0
used a1 2020-09-02 enter 4
used a1 2020-09-02 open 0
used a1 2020-09-02 close 0
used a1 2020-09-02 update 0
not_used a1 2020-09-02 delete 0
used a2 2020-09-01 enter 0
used a2 2020-09-01 open 0
used a2 2020-09-01 close 10
used a2 2020-09-01 update 0
not_used a2 2020-09-01 delete 0
used b1 2020-09-01 enter 0
used b1 2020-09-01 open 0
used b1 2020-09-01 close 0
used b1 2020-09-01 update 10
not_used b1 2020-09-01 delete 0
正如你所看到的,我还保留了专栏";评论";。我怎么能那样做?
Thise不保留列";评论":
library(dplyr)
library(tidyr)
df2 %>%
mutate(type = factor(type, levels = df1$type)) %>%
group_by(ID, date) %>%
complete(type, fill = list(value = 0))
试试这个:
library(tidyverse)
#Code
new <- df2 %>%
mutate(type = factor(type, levels = df1$type)) %>%
group_by(ID, date) %>%
complete(type, fill = list(value = 0)) %>%
left_join(df1)
输出:
# A tibble: 20 x 5
# Groups: ID, date [4]
ID date type value comment
<chr> <chr> <chr> <dbl> <chr>
1 a1 2020-09-01 enter 18 used
2 a1 2020-09-01 open 0 used
3 a1 2020-09-01 close 15 used
4 a1 2020-09-01 update 0 used
5 a1 2020-09-01 delete 0 not_used
6 a1 2020-09-02 enter 4 used
7 a1 2020-09-02 open 0 used
8 a1 2020-09-02 close 0 used
9 a1 2020-09-02 update 0 used
10 a1 2020-09-02 delete 0 not_used
11 a2 2020-09-01 enter 0 used
12 a2 2020-09-01 open 0 used
13 a2 2020-09-01 close 10 used
14 a2 2020-09-01 update 0 used
15 a2 2020-09-01 delete 0 not_used
16 b1 2020-09-02 enter 0 used
17 b1 2020-09-02 open 0 used
18 b1 2020-09-02 close 0 used
19 b1 2020-09-02 update 10 used
20 b1 2020-09-02 delete 0 not_used
我认为一个简单的解决方案是在之后执行full_join
df2 %>%
mutate(type = factor(type, levels = df1$type)) %>%
group_by(ID, date) %>%
complete(type, fill = list(value = 0)) %>%
full_join(df1, by = "type")