R:用vector填充/补全每组数据帧

  • 本文关键字:数据帧 vector 填充 r
  • 更新时间 :
  • 英文 :


我有一个数据框架有多个变量,这是一个例子:

data.frame(ID = c("Mickey", "Goofy", "Donald", "Mickey", "Donald", "Pluto"),
group = c("A", "A", "A", "B", "C", "C"),
Var = c(3, 2, 1, 4, 5, 2))
ID group Var
1 Mickey     A   3
2  Goofy     A   2
3 Donald     A   1
4 Mickey     B   4
5 Donald     C   5
6  Pluto     C   2

我想要一个新的数据帧,其中所有的ID出现在每个组中,其中缺席的ID有Var = 0

:

ID group Var
1  Mickey     A   3
2   Goofy     A   2
3  Donald     A   1
4   Pluto     A   0
5  Mickey     B   4
6   Goofy     B   0
7  Donald     B   0
8   Pluto     B   0
9  Mickey     C   0
10  Goofy     C   0
11 Donald     C   5
12  Pluto     C   2

我尝试使用join_left和merge as:

a=unique(df1$ID)
df2 <- df1 %>%
group_by(group)%>%
join_left(a)

但是它们都不是这样工作的

Usingcomplete()fromtidyr:

library(tidyr)
df %>%
complete(group, ID, fill = list(Var = 0))
# A tibble: 12 × 3
group ID       Var
<chr> <chr>  <dbl>
1 A     Donald     1
2 A     Goofy      2
3 A     Mickey     3
4 A     Pluto      0
5 B     Donald     0
6 B     Goofy      0
7 B     Mickey     4
8 B     Pluto      0
9 C     Donald     5
10 C     Goofy      0
11 C     Mickey     0
12 C     Pluto      2

Abase解:

transform(merge(expand.grid(lapply(df[2:1], unique)), df, all.x = TRUE, sort = TRUE),
Var = replace(Var, is.na(Var), 0))

complete()输出相同,只是行顺序不同。

以下是我使用基本R

的解决方案
DF_raw<-data.frame(ID = c("Mickey", "Goofy", "Donald", "Mickey", "Donald", "Pluto"),
group = c("A", "A", "A", "B", "C", "C"),
Var = c(3, 2, 1, 4, 5, 2))
groups<-unique(DF_raw$group)
IDs<-unique(DF_raw$ID)
DF_clean<-dplyr::bind_rows(lapply(IDs, function(ID){data.frame(ID=ID,group=groups)})) #create the possibilies
DF_clean$Var<-sapply(1:nrow(DF_clean), function(ROW){
OUT<-DF_raw$Var[which(
DF_raw$ID==DF_clean$ID[ROW]& 
DF_raw$group==DF_clean$group[ROW]
)] #finds the Var if it exists
if(length(OUT)==0){
OUT<-0 #new ID 
}
OUT
})
print(DF_clean)

最新更新