我用一个非常复杂的方法来解决我认为应该有一个简单得多的解决方案的事情。总之,我想要的是:
- 我想计算一个包含3列最小值的新列
- 我想忽略0和na
- 如果我只有0和na我想要一个0
- 如果我只有NAs,我想要NA
这是我的解决方案,它工作,但它非常复杂,并产生一个警告。
> library(dplyr)
> df <- data.frame(
+ id = c(1, 2, 3, 4),
+ test1 = c( NA, NA, 2 , 3),
+ test2 = c( NA, 0, 1 , 1),
+ test3 = c(NA, NA, 0 , 2)
+ )
> df2 <- df %>%
+ mutate(nieuw = apply(across(test1:test3), 1, function(x) min(x[x>0]))) %>%
+ rowwise() %>%
+ mutate(nieuw = if_else(is.na(nieuw), max(across(test1:test3), na.rm = TRUE), nieuw)) %>%
+ mutate(nieuw = ifelse(is.infinite(nieuw), NA, nieuw))
> df
id test1 test2 test3
1 1 NA NA NA
2 2 NA 0 NA
3 3 2 1 0
4 4 3 1 2
> df2
# A tibble: 4 x 5
# Rowwise:
id test1 test2 test3 nieuw
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 NA NA NA NA
2 2 NA 0 NA 0
3 3 2 1 0 1
4 4 3 1 2 1
Warning message:
Problem while computing `nieuw = if_else(...)`.
i no non-missing arguments to max; returning -Inf
i The warning occurred in row 1.
您可以创建一个辅助函数,然后应用它rowwise
:
library(dplyr)
safe <- function(x, f, ...) ifelse(all(is.na(x)), NA,
ifelse(all(is.na(x) | x == 0),
0, f(x[x > 0], na.rm = TRUE, ...)))
df %>%
rowwise() %>%
mutate(a = safe(c_across(test1:test3), min))
# A tibble: 4 × 5
# Rowwise:
id test1 test2 test3 a
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 NA NA NA NA
2 2 NA 0 NA 0
3 3 2 1 0 1
4 4 3 1 2 1
这是另一个选项。它使0和NA非常大然后在最后重新编码:
library(tidyverse)
get_min <- function(data, cols){
data[is.na(data)] <- 1e6
data[data == 0] <- 1e5
nums <- do.call(pmin, select(data, all_of(cols)))
recode(nums, `1e+06` = NA_real_, `1e+05` = 0.)
}
df %>%
mutate(nieuw = get_min(., c("test1", "test2", "test3")))
#> id test1 test2 test3 nieuw
#> 1 1 NA NA NA NA
#> 2 2 NA 0 NA 0
#> 3 3 2 1 0 1
#> 4 4 3 1 2 1