r语言 - 在所有列中创建一个新的变量NA



我有一个标题像这样:

dat  = tibble(a1 = c(23, NA, 3, 0, NA),
a2 = c(NA, 6, 0, 9, NA),
a3 = c(NA, NA, "censored", "censored", NA),
a4 = c(NA, "censored", NA, NA, NA))

我想创建一个名为"class"满足以下条件:

  • 如果a1或a2的值不等于0,则class = "yes",
  • if所有以字母"a"开头的变量= NA,则class = "no",
  • 其他的,class = "censored"(这些列中只有一列是"censored",那么class = "censored")

尝试仅使用base R创建示例。我不确定我是否正确理解了所有的条件。

我相信使用dplyrdata.table会有更好的解决方案,但我不知道你的偏好。

library(tibble)
# create data
dat  = tibble(
a1 = c(23, NA, 3, 0, NA),
a2 = c(NA, 6, 0, 9, NA),
a3 = c(NA, NA, "censored", "censored", NA),
a4 = c(NA, "censored", NA, NA, NA)
)
# 1. if either a1 or a2 has the number not equal to 0, then class = "yes" ####
dat$class <- ifelse(dat$a1 != 0 | dat$a2 != 0, 'yes', NA)
# 2. if all variables that start with letter "a" equal to NA, then class = "no" ####
# identify names starting with "a" and create a pattern for grepl
names <- names(dat)[grep("^a.*", names(dat))]
pattern <- paste(names, collapse = '|')
# check if all pattern cols are NA and apply "no" to dat$class
# achieved by comparing row sum of NA cols with ncol()
dat$class <-
ifelse(rowSums(is.na(dat[, grepl(pattern, colnames(dat))])) == ncol(dat[, grepl(pattern, colnames(dat))]), 'no', dat$class)

# 3. other else, class = "censored" (only one of these columns has "censored", then class = "censored") ####
# check if pattern cols contain "censored" and apply "censored" to dat$class
# achieved by checking for row sum > 0 matching the condition of == "censored"
dat$class <-
ifelse(rowSums(dat[, grepl(pattern, colnames(dat))] == "censored", na.rm = TRUE) > 0,
"censored",
dat$class)

访问以"a"开头的颜色可以通过索引dat[,1:4]在这个例子中完成,但可能你的实际数据看起来不同。

更新基于@NarimeneL先前给出的解决方案的示例。请注意,case_when语句的顺序在这里很重要!

library(tibble)
library(dplyr)
library(magrittr)
library(tidyselect)

# create data
dat  = tibble(
a1 = c(23, NA, 3, 0, NA),
a2 = c(NA, 6, 0, 9, NA),
a3 = c(NA, NA, "censored", "censored", NA),
a4 = c(NA, "censored", NA, NA, NA)
)

dat2 <- dat %>% select(starts_with("a")) %>%
mutate(class = case_when(
rowSums(. == "censored", na.rm = TRUE) > 0 ~ "censored" ,
a1 != 0  ~ "Yes ",
a2 != 0 ~ "Yes",
rowSums(is.na(.)) == ncol(.) ~ 'no'
))

我对示例数据有点困惑。如果我对规则的理解正确的话,示例中的所有行都不会被删减,因为除了最后一行都是NA之外,a1或a2总是非零的。

mutate(dat, class = case_when(
a1 != 0 | a2 != 0 ~ "yes",
if_all(starts_with("a"), is.na) ~ "no",
TRUE ~ "censored"
))
# A tibble: 5 x 5
a1    a2 a3       a4       class
<dbl> <dbl> <chr>    <chr>    <chr>
1    23    NA NA       NA       yes  
2    NA     6 NA       censored yes  
3     3     0 censored NA       yes  
4     0     9 censored NA       yes  
5    NA    NA NA       NA       no 

你可以在数据框架上转换你的表:

dat = as.data.frame(dat)

,然后你可以用条件创建新的变量:

library(dplyr)
library(magrittr)
library(tidyselect)

dat2 = dat %>% select(starts_with("a")) %>%  mutate(
class = case_when(
a1 != 0  ~ "Yes ",
a2 != 0 ~"Yes"    ))

相关内容

  • 没有找到相关文章

最新更新