我有一个标题像这样:
dat = tibble(a1 = c(23, NA, 3, 0, NA),
a2 = c(NA, 6, 0, 9, NA),
a3 = c(NA, NA, "censored", "censored", NA),
a4 = c(NA, "censored", NA, NA, NA))
我想创建一个名为"class"满足以下条件:
- 如果a1或a2的值不等于0,则class = "yes",
- if所有以字母"a"开头的变量= NA,则class = "no",
- 其他的,class = "censored"(这些列中只有一列是"censored",那么class = "censored")
尝试仅使用base R创建示例。我不确定我是否正确理解了所有的条件。
我相信使用dplyr
或data.table
会有更好的解决方案,但我不知道你的偏好。
library(tibble)
# create data
dat = tibble(
a1 = c(23, NA, 3, 0, NA),
a2 = c(NA, 6, 0, 9, NA),
a3 = c(NA, NA, "censored", "censored", NA),
a4 = c(NA, "censored", NA, NA, NA)
)
# 1. if either a1 or a2 has the number not equal to 0, then class = "yes" ####
dat$class <- ifelse(dat$a1 != 0 | dat$a2 != 0, 'yes', NA)
# 2. if all variables that start with letter "a" equal to NA, then class = "no" ####
# identify names starting with "a" and create a pattern for grepl
names <- names(dat)[grep("^a.*", names(dat))]
pattern <- paste(names, collapse = '|')
# check if all pattern cols are NA and apply "no" to dat$class
# achieved by comparing row sum of NA cols with ncol()
dat$class <-
ifelse(rowSums(is.na(dat[, grepl(pattern, colnames(dat))])) == ncol(dat[, grepl(pattern, colnames(dat))]), 'no', dat$class)
# 3. other else, class = "censored" (only one of these columns has "censored", then class = "censored") ####
# check if pattern cols contain "censored" and apply "censored" to dat$class
# achieved by checking for row sum > 0 matching the condition of == "censored"
dat$class <-
ifelse(rowSums(dat[, grepl(pattern, colnames(dat))] == "censored", na.rm = TRUE) > 0,
"censored",
dat$class)
访问以"a"开头的颜色可以通过索引dat[,1:4]
在这个例子中完成,但可能你的实际数据看起来不同。
更新基于@NarimeneL先前给出的解决方案的示例。请注意,case_when
语句的顺序在这里很重要!
library(tibble)
library(dplyr)
library(magrittr)
library(tidyselect)
# create data
dat = tibble(
a1 = c(23, NA, 3, 0, NA),
a2 = c(NA, 6, 0, 9, NA),
a3 = c(NA, NA, "censored", "censored", NA),
a4 = c(NA, "censored", NA, NA, NA)
)
dat2 <- dat %>% select(starts_with("a")) %>%
mutate(class = case_when(
rowSums(. == "censored", na.rm = TRUE) > 0 ~ "censored" ,
a1 != 0 ~ "Yes ",
a2 != 0 ~ "Yes",
rowSums(is.na(.)) == ncol(.) ~ 'no'
))
我对示例数据有点困惑。如果我对规则的理解正确的话,示例中的所有行都不会被删减,因为除了最后一行都是NA之外,a1或a2总是非零的。
mutate(dat, class = case_when(
a1 != 0 | a2 != 0 ~ "yes",
if_all(starts_with("a"), is.na) ~ "no",
TRUE ~ "censored"
))
# A tibble: 5 x 5
a1 a2 a3 a4 class
<dbl> <dbl> <chr> <chr> <chr>
1 23 NA NA NA yes
2 NA 6 NA censored yes
3 3 0 censored NA yes
4 0 9 censored NA yes
5 NA NA NA NA no
你可以在数据框架上转换你的表:
dat = as.data.frame(dat)
,然后你可以用条件创建新的变量:
library(dplyr)
library(magrittr)
library(tidyselect)
dat2 = dat %>% select(starts_with("a")) %>% mutate(
class = case_when(
a1 != 0 ~ "Yes ",
a2 != 0 ~"Yes" ))