ID <- 1:6
math <- c("YES","NO","YES","NO",NA,NA)
history <- c(NA,NA,"NO","NO","YES",NA)
dt <- data.frame(ID, math, history)
ID math history
1 1 YES <NA>
2 2 NO <NA>
3 3 YES NO
4 4 NO NO
5 5 <NA> YES
6 6 <NA> <NA>
我想做一个额外的列("pass")如下
如果学生曾经有过"是";至少一次:"是";(不管另一个主题是否缺少数据)
如果学生没有得到"是";
- 如果两个受试者都缺少数据:NA
- 如果其中一个受试者是"NO":"NO">
所以,列是这样的:(我可以用这个最小的例子手动做到这一点。但不是用我的真实数据)
> dt
ID math history pass
1 1 YES <NA> YES
2 2 NO <NA> NO
3 3 YES NO YES
4 4 NO NO NO
5 5 <NA> YES YES
6 6 <NA> <NA> <NA>
我试图使用
dt$pass <- ifelse(rowSums(dt[,-1]=="YES",na.rm=T)>0,"YES","NO")
的代码,但它是棘手的因为如果我输入na.rm=TRUE
,他们认为NA是"no"(ID 6的学生将是"no")
如果我输入na.rm=FALSE
,则只考虑具有两个主题数据的学生。
在我的数据中,我有很多列,不仅仅是数学和历史。
一个简单的base
解决方案是
dt$pass <- apply(dt[-1], 1, (x) sort(x, dec = TRUE)[1])
# > dt
# ID math history pass
# 1 1 YES <NA> YES
# 2 2 NO <NA> NO
# 3 3 YES NO YES
# 4 4 NO NO NO
# 5 5 <NA> YES YES
# 6 6 <NA> <NA> <NA>
其dplyr
等价物为
library(dplyr)
dt %>%
rowwise() %>%
mutate(pass = sort(c_across(-1), dec = TRUE)[1]) %>%
ungroup()
试试这个
fun <- function(x){
if(all(is.na(x))) return(NA_character_)
if(any(na.omit(x == "YES"))) return("YES")
return("NO")
}
dt %>% rowwise() %>% mutate(pass = fun(c_across(-ID)))
对于受试者的数量(和名称)具有鲁棒性的整洁解决方案:
library(tidyverse)
dt %>%
mutate(
pass=dt %>%
pivot_longer(-ID) %>%
group_by(ID) %>%
summarise(
anyYes=sum(value == "YES", na.rm=T),
anyNo=sum(value == "NO", na.rm=T)
) %>%
mutate(
pass=ifelse(
anyYes >= 1,
"YES",
ifelse(anyNo >= 1, "NO", NA)
)
) %>%
pull(pass)
)
ID math history pass
1 1 YES <NA> YES
2 2 NO <NA> NO
3 3 YES NO YES
4 4 NO NO NO
5 5 <NA> YES YES
6 6 <NA> <NA> <NA>
关键是pivot
到长格式。
也许rowMeans
显示了您希望rowSums
应该有的行为。如果只有NA
,则返回NA
。
c("NO", "YES")[1 + (rowMeans(dt[-1] == "YES", TRUE) > 0)]
#[1] "YES" "NO" "YES" "NO" "YES" NA
或者使用你的代码行:
ifelse(rowMeans(dt[-1]=="YES",na.rm=T)>0,"YES","NO")
#[1] "YES" "NO" "YES" "NO" "YES" NA
library(tidyverse)
ID <- 1:6
math <- c("YES", "NO", "YES", "NO", NA, NA)
history <- c(NA, NA, "NO", "NO", "YES", NA)
dt <- data.frame(ID, math, history)
dt |>
rowwise() |>
mutate(pass = case_when(
sum(c_across(-ID) == "YES", na.rm = TRUE) >= 1 ~ "YES",
sum(c_across(-ID) == "NO", na.rm = TRUE) >= 1 ~ "No",
TRUE ~ NA_character_
))
# Add ungroup() if you need to do further ungrouped processing.
#> # A tibble: 6 × 4
#> ID math history pass
#> <int> <chr> <chr> <chr>
#> 1 1 YES <NA> YES
#> 2 2 NO <NA> No
#> 3 3 YES NO YES
#> 4 4 NO NO No
#> 5 5 <NA> YES YES
#> 6 6 <NA> <NA> <NA>
由reprex包(v2.0.1)创建于2022-06-10
似乎顺序很重要:
dt$pass <- NA
dt$pass[dt$math == "NO" | dt$history=="NO"] <- "NO"
dt$pass[dt$math == "YES" | dt$history=="YES"] <- "YES"
> dt
ID math history pass
1 1 YES <NA> YES
2 2 NO <NA> NO
3 3 YES NO YES
4 4 NO NO NO
5 5 <NA> YES YES
6 6 <NA> <NA> <NA>