我有一个数据帧,例如:
Groups Event Value
G1 1 Canidae
G1 1 Canidae
G1 1 Felidae
G1 1 NA
G1 2 Felidae
G1 2 NA
G1 2 NA
G1 2 Felidae
G1 3 NA
G2 1 NA
G2 1 NA
G3 1 Lemuridae
G3 2 NA
G3 3 Lemuridae
G4 1 Felidae
G4 1 Felidae
G4 1 unknown
G5 1 unknown
G5 1 Felidae
我希望在每个Groups
和Event
中,根据一致值用Value
填充NA值,例如G1 Event1
中有一个NA,则一致值为Canidae。因此,我将NA替换为Canidae
最后我应该得到:
Groups Event Value
G1 1 Canidae
G1 1 Canidae
G1 1 Canidae
G1 2 Felidae
G1 2 Felidae
G1 2 Felidae
G1 2 Felidae
G1 3 NA
G2 1 Lemuridae
G2 1 Lemuridae
G3 1 Lemuridae
G3 2 NA
G3 3 Lemuridae
G4 1 Felidae
G4 1 Felidae
G4 1 Felidae
G5 1 Felidae
G5 1 Felidae
有人有主意吗?非常感谢您抽出时间。
数据如下:
structure(list(Groups = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("G1", "G2",
"G3", "G4"), class = "factor"), Event = c(1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 3L, 1L, 1L, 1L, 2L, 3L, 1L, 1L, 1L), Value = structure(c(1L,
1L, 2L, NA, 2L, NA, NA, 2L, NA, NA, NA, 3L, NA, 3L, 2L, 2L, 4L
), .Label = c("Canidae", "Felidae", "Lemuridae", "unknown"), class = "factor")), class = "data.frame", row.names = c(NA,
-17L))
我们可以通过Mode
进行分组
library(dplyr)
df1 %>%
mutate(Value = as.character(Value)) %>%
group_by(Groups, Event) %>%
mutate(Value = replace(Value, is.na(Value)|Value %in% "unknown",
Mode(Value[Value != "unknown"])))
其中
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
data.table
选项
setDT(df)[
,
Value := replace(
Value,
is.na(Value),
ifelse(all(is.na(Value)),
NA,
names(rev(sort(table(na.omit(Value)))))[1]
)
), .(Groups, Event)
]
给出
Groups Event Value
1: G1 1 Canidae
2: G1 1 Canidae
3: G1 1 Felidae
4: G1 1 Canidae
5: G1 2 Felidae
6: G1 2 Felidae
7: G1 2 Felidae
8: G1 2 Felidae
9: G1 3 <NA>
10: G2 1 <NA>
11: G2 1 <NA>
12: G3 1 Lemuridae
13: G3 2 <NA>
14: G3 3 Lemuridae