r-如何使用ifelse语句通过data.table语法分组来获取方法



我使用的data.table代码运行良好,但无法转换为包含ifelse语句。我使用以下代表:

set.seed(1645)
Place <- c(rep("Copenhagen",7),rep("Berlin",11),rep("Roma",12))
Year <- c(rep("2020",4),rep("2021",3),rep("2020",6),rep("2021",5),rep("2019",4),rep("2020",4),rep("2021",4))
Value1 <- c(runif(3),NA,runif(8),NA,runif(9),NA,runif(7))
Value2 <- c(runif(4),NA,runif(2),runif(6),NA,NA,runif(11),NA,NA,runif(2))
df <- data.frame(Place,Year,Value1,Value2)
> df
Place Year     Value1      Value2
1  Copenhagen 2020 0.10517697 0.865935100
2  Copenhagen 2020 0.96597760 0.579956282
3  Copenhagen 2020 0.47262307 0.346569960
4  Copenhagen 2020         NA 0.478763951
5  Copenhagen 2021 0.90030423          NA
6  Copenhagen 2021 0.14444142 0.280377315
7  Copenhagen 2021 0.73801550 0.302816525
8      Berlin 2020 0.13961383 0.641314310
9      Berlin 2020 0.40221211 0.756374251
10     Berlin 2020 0.49613139 0.070459347
11     Berlin 2020 0.95190545 0.184497038
12     Berlin 2020 0.40182901 0.407892240
13     Berlin 2020         NA 0.002209376
14     Berlin 2021 0.38310025          NA
15     Berlin 2021 0.76417492          NA
16     Berlin 2021 0.29001287 0.632133629
17     Berlin 2021 0.84478784 0.365406326
18     Berlin 2021 0.55547323 0.493870653
19       Roma 2019 0.44198733 0.067744090
20       Roma 2019 0.50403809 0.847876518
21       Roma 2019 0.85358805 0.952393606
22       Roma 2019 0.74996137 0.887583928
23       Roma 2020         NA 0.631937527
24       Roma 2020 0.08303509 0.993400333
25       Roma 2020 0.74205719 0.589183185
26       Roma 2020 0.27552659 0.522451407
27       Roma 2021 0.39518410          NA
28       Roma 2021 0.38390124          NA
29       Roma 2021 0.36605674 0.942102065
30       Roma 2021 0.32014949 0.375689863

如果存在<=25%NA。如果没有我的条件,这很好:

setDT(df)
df_means <- df[,.(Value1_mean = mean(Value1),Value2_mean = mean(Value2)), by = .(Place,Year)]
> df_means
Place Year Value1_mean Value2_mean
1: Copenhagen 2020          NA   0.4257258
2: Copenhagen 2021   0.3581245          NA
3:     Berlin 2020          NA   0.3935807
4:     Berlin 2021   0.3729461          NA
5:       Roma 2019   0.4572996   0.3956536
6:       Roma 2020          NA   0.6494491
7:       Roma 2021   0.4142637          NA

我没有包含ifelse语句,这不起作用:

df_means2 <- df[,.(Value1_mean = ifelse(sum(is.na(Value1))/length(Value1)>=0.25,NA,mean(Value1,na.rm=TRUE)),
Value2_mean = ifelse(sum(is.na(Value2))/length(Value2)>=0.25,NA,mean(Value2,na.rm=TRUE))), 
by = .(Place,Year)]

我检查了这些帖子1、2和3,但没有解决我的问题。我的预期结果应该是:

> df_means2
Place Year Value1_mean Value2_mean
1 Copenhagen 2020        mean        mean
2 Copenhagen 2021        mean        <NA>
3     Berlin 2020        mean        mean
4     Berlin 2021        mean        <NA>
5       Roma 2019        mean        mean
6       Roma 2020        mean        mean
7       Roma 2021        mean        <NA>

如何转换代码?

我们可以使用if/else

library(data.table)
df[, lapply(.SD, function(x) if(mean(is.na(x)) <= 0.25) 
mean(x, na.rm = TRUE) else NA_real_), by = .(Place, Year)]

-输出

Place Year    Value1    Value2
1: Copenhagen 2020 0.5145925 0.5678063
2: Copenhagen 2021 0.5942537        NA
3:     Berlin 2020 0.4783384 0.3437911
4:     Berlin 2021 0.5675098        NA
5:       Roma 2019 0.6373937 0.6888995
6:       Roma 2020 0.3668730 0.6842431
7:       Roma 2021 0.3663229        NA

在OP的代码中,使用NA_real_而不是默认为logicalNA,这会在class中产生冲突

df[,.(Value1_mean = ifelse(sum(is.na(Value1))/length(Value1)>0.25,
NA_real_,mean(Value1,na.rm=TRUE)),
Value2_mean = ifelse(sum(is.na(Value2))/length(Value2)>0.25,
NA_real_,
mean(Value2,na.rm=TRUE))), 
by = .(Place,Year)]

-输出

Place Year Value1_mean Value2_mean
1: Copenhagen 2020   0.5145925   0.5678063
2: Copenhagen 2021   0.5942537          NA
3:     Berlin 2020   0.4783384   0.3437911
4:     Berlin 2021   0.5675098          NA
5:       Roma 2019   0.6373937   0.6888995
6:       Roma 2020   0.3668730   0.6842431
7:       Roma 2021   0.3663229          NA

最新更新