我试图在数据框中找到三个不同变量的总和,同时按另一个变量分组,但有几个 NA。NA 的总和被解释为零而不是 NA。下面是一个示例:
my_data <- data.frame(Month = c("1995-01-01", "1995-01-01", "1995-01-01",
"1995-02-01", "1995-02-01"),
Value_1 = c(1, NA, 2, NA, NA),
Value_2 = c(2, 2, 3, NA, 1),
Value_3 = c(NA, NA, NA, NA, NA))
#summing through dplyr
my_data %>%
group_by(Month) %>%
summarise_each(funs(sum(.,na.rm = TRUE)))
#summing through base R
my_vars = c("Value_1", "Value_2", "Value_3")
aggregate(x = my_data[my_vars], by = my_data["Month"], FUN = sum,
na.rm = TRUE)
例如,对于这两个月Value_3,我得到的总和是零而不是 NA。关于如何对 NA 求和以获得 NA 而不是零的任何建议将不胜感激。
如果变量中的所有值都是 NA,您可以添加一个 if/else 来返回 NA:
my_data %>%
group_by(Month) %>%
summarise_all(
funs(if(all(is.na(.))) NA else sum(., na.rm = TRUE))
)
# A tibble: 2 x 4
# Month Value_1 Value_2 Value_3
# <fctr> <dbl> <dbl> <lgl>
#1 1995-01-01 3 7 NA
#2 1995-02-01 NA 1 NA
根据您自己的方法,添加ifelse
my_data %>%
group_by(Month) %>%
summarise_each(funs(ifelse(sum(is.na(.))==length(.),NA,sum(.,na.rm = TRUE))))
我们也可以使用
library(data.table)
setDT(my_data)[, lapply(.SD, function(x) sum(x, na.rm = TRUE) *NA^(all(is.na(x)))), Month]
# Month Value_1 Value_2 Value_3
#1: 1995-01-01 3 7 NA
#2: 1995-02-01 NA 1 NA