r语言 - 在逐行操作中处理丢失的数据


ID <- 1:6
math <- c("YES","NO","YES","NO",NA,NA)
history <- c(NA,NA,"NO","NO","YES",NA)
dt <- data.frame(ID, math, history)
ID math history
1  1  YES    <NA>
2  2   NO    <NA>
3  3  YES      NO
4  4   NO      NO
5  5 <NA>     YES
6  6 <NA>    <NA>

我想做一个额外的列("pass")如下

  1. 如果学生曾经有过"是";至少一次:"是";(不管另一个主题是否缺少数据)

  2. 如果学生没有得到"是";

    • 如果两个受试者都缺少数据:NA
    • 如果其中一个受试者是"NO":"NO">

所以,列是这样的:(我可以用这个最小的例子手动做到这一点。但不是用我的真实数据)

> dt
ID math history pass
1  1  YES    <NA>  YES
2  2   NO    <NA>   NO
3  3  YES      NO  YES
4  4   NO      NO   NO
5  5 <NA>     YES  YES
6  6 <NA>    <NA> <NA>

我试图使用

dt$pass <- ifelse(rowSums(dt[,-1]=="YES",na.rm=T)>0,"YES","NO")

的代码,但它是棘手的因为如果我输入na.rm=TRUE,他们认为NA是"no"(ID 6的学生将是"no")

如果我输入na.rm=FALSE,则只考虑具有两个主题数据的学生。

在我的数据中,我有很多列,不仅仅是数学和历史。

一个简单的base解决方案是

dt$pass <- apply(dt[-1], 1, (x) sort(x, dec = TRUE)[1])
# > dt
#   ID math history pass
# 1  1  YES    <NA>  YES
# 2  2   NO    <NA>   NO
# 3  3  YES      NO  YES
# 4  4   NO      NO   NO
# 5  5 <NA>     YES  YES
# 6  6 <NA>    <NA> <NA>

dplyr等价物为

library(dplyr)
dt %>%
rowwise() %>%
mutate(pass = sort(c_across(-1), dec = TRUE)[1]) %>%
ungroup()

试试这个

fun <- function(x){
if(all(is.na(x))) return(NA_character_)
if(any(na.omit(x == "YES"))) return("YES")
return("NO")
}
dt %>% rowwise() %>% mutate(pass = fun(c_across(-ID)))

对于受试者的数量(和名称)具有鲁棒性的整洁解决方案:

library(tidyverse)
dt %>% 
mutate(
pass=dt %>% 
pivot_longer(-ID) %>% 
group_by(ID) %>% 
summarise(
anyYes=sum(value == "YES", na.rm=T), 
anyNo=sum(value == "NO", na.rm=T)
) %>% 
mutate(
pass=ifelse(
anyYes >= 1, 
"YES", 
ifelse(anyNo >= 1, "NO", NA)
)
) %>% 
pull(pass)
)
ID math history pass
1  1  YES    <NA>  YES
2  2   NO    <NA>   NO
3  3  YES      NO  YES
4  4   NO      NO   NO
5  5 <NA>     YES  YES
6  6 <NA>    <NA> <NA>

关键是pivot到长格式。

也许rowMeans显示了您希望rowSums应该有的行为。如果只有NA,则返回NA

c("NO", "YES")[1 + (rowMeans(dt[-1] == "YES", TRUE) > 0)]
#[1] "YES" "NO"  "YES" "NO"  "YES" NA

或者使用你的代码行:

ifelse(rowMeans(dt[-1]=="YES",na.rm=T)>0,"YES","NO")
#[1] "YES" "NO"  "YES" "NO"  "YES" NA
library(tidyverse)
ID <- 1:6
math <- c("YES", "NO", "YES", "NO", NA, NA)
history <- c(NA, NA, "NO", "NO", "YES", NA)
dt <- data.frame(ID, math, history)
dt |> 
rowwise() |> 
mutate(pass = case_when(
sum(c_across(-ID) == "YES", na.rm = TRUE) >= 1 ~ "YES",
sum(c_across(-ID) == "NO", na.rm = TRUE) >= 1  ~ "No",
TRUE                                           ~ NA_character_
)) 
# Add ungroup() if you need to do further ungrouped processing.
#> # A tibble: 6 × 4
#>      ID math  history pass 
#>   <int> <chr> <chr>   <chr>
#> 1     1 YES   <NA>    YES  
#> 2     2 NO    <NA>    No   
#> 3     3 YES   NO      YES  
#> 4     4 NO    NO      No   
#> 5     5 <NA>  YES     YES  
#> 6     6 <NA>  <NA>    <NA>

由reprex包(v2.0.1)创建于2022-06-10

似乎顺序很重要:

dt$pass <- NA
dt$pass[dt$math == "NO" | dt$history=="NO"] <- "NO"
dt$pass[dt$math == "YES" | dt$history=="YES"] <- "YES"
> dt
ID math history pass
1  1  YES    <NA>  YES
2  2   NO    <NA>   NO
3  3  YES      NO  YES
4  4   NO      NO   NO
5  5 <NA>     YES  YES
6  6 <NA>    <NA> <NA>

最新更新