我有一个数据框架有多个变量,这是一个例子:
data.frame(ID = c("Mickey", "Goofy", "Donald", "Mickey", "Donald", "Pluto"),
group = c("A", "A", "A", "B", "C", "C"),
Var = c(3, 2, 1, 4, 5, 2))
ID group Var
1 Mickey A 3
2 Goofy A 2
3 Donald A 1
4 Mickey B 4
5 Donald C 5
6 Pluto C 2
我想要一个新的数据帧,其中所有的ID出现在每个组中,其中缺席的ID有Var = 0
:
ID group Var
1 Mickey A 3
2 Goofy A 2
3 Donald A 1
4 Pluto A 0
5 Mickey B 4
6 Goofy B 0
7 Donald B 0
8 Pluto B 0
9 Mickey C 0
10 Goofy C 0
11 Donald C 5
12 Pluto C 2
我尝试使用join_left和merge as:
a=unique(df1$ID)
df2 <- df1 %>%
group_by(group)%>%
join_left(a)
但是它们都不是这样工作的
Usingcomplete()
fromtidyr
:
library(tidyr)
df %>%
complete(group, ID, fill = list(Var = 0))
# A tibble: 12 × 3
group ID Var
<chr> <chr> <dbl>
1 A Donald 1
2 A Goofy 2
3 A Mickey 3
4 A Pluto 0
5 B Donald 0
6 B Goofy 0
7 B Mickey 4
8 B Pluto 0
9 C Donald 5
10 C Goofy 0
11 C Mickey 0
12 C Pluto 2
Abase
解:
transform(merge(expand.grid(lapply(df[2:1], unique)), df, all.x = TRUE, sort = TRUE),
Var = replace(Var, is.na(Var), 0))
与complete()
输出相同,只是行顺序不同。
以下是我使用基本R
的解决方案DF_raw<-data.frame(ID = c("Mickey", "Goofy", "Donald", "Mickey", "Donald", "Pluto"),
group = c("A", "A", "A", "B", "C", "C"),
Var = c(3, 2, 1, 4, 5, 2))
groups<-unique(DF_raw$group)
IDs<-unique(DF_raw$ID)
DF_clean<-dplyr::bind_rows(lapply(IDs, function(ID){data.frame(ID=ID,group=groups)})) #create the possibilies
DF_clean$Var<-sapply(1:nrow(DF_clean), function(ROW){
OUT<-DF_raw$Var[which(
DF_raw$ID==DF_clean$ID[ROW]&
DF_raw$group==DF_clean$group[ROW]
)] #finds the Var if it exists
if(length(OUT)==0){
OUT<-0 #new ID
}
OUT
})
print(DF_clean)