我使用的data.table代码运行良好,但无法转换为包含ifelse
语句。我使用以下代表:
set.seed(1645)
Place <- c(rep("Copenhagen",7),rep("Berlin",11),rep("Roma",12))
Year <- c(rep("2020",4),rep("2021",3),rep("2020",6),rep("2021",5),rep("2019",4),rep("2020",4),rep("2021",4))
Value1 <- c(runif(3),NA,runif(8),NA,runif(9),NA,runif(7))
Value2 <- c(runif(4),NA,runif(2),runif(6),NA,NA,runif(11),NA,NA,runif(2))
df <- data.frame(Place,Year,Value1,Value2)
> df
Place Year Value1 Value2
1 Copenhagen 2020 0.10517697 0.865935100
2 Copenhagen 2020 0.96597760 0.579956282
3 Copenhagen 2020 0.47262307 0.346569960
4 Copenhagen 2020 NA 0.478763951
5 Copenhagen 2021 0.90030423 NA
6 Copenhagen 2021 0.14444142 0.280377315
7 Copenhagen 2021 0.73801550 0.302816525
8 Berlin 2020 0.13961383 0.641314310
9 Berlin 2020 0.40221211 0.756374251
10 Berlin 2020 0.49613139 0.070459347
11 Berlin 2020 0.95190545 0.184497038
12 Berlin 2020 0.40182901 0.407892240
13 Berlin 2020 NA 0.002209376
14 Berlin 2021 0.38310025 NA
15 Berlin 2021 0.76417492 NA
16 Berlin 2021 0.29001287 0.632133629
17 Berlin 2021 0.84478784 0.365406326
18 Berlin 2021 0.55547323 0.493870653
19 Roma 2019 0.44198733 0.067744090
20 Roma 2019 0.50403809 0.847876518
21 Roma 2019 0.85358805 0.952393606
22 Roma 2019 0.74996137 0.887583928
23 Roma 2020 NA 0.631937527
24 Roma 2020 0.08303509 0.993400333
25 Roma 2020 0.74205719 0.589183185
26 Roma 2020 0.27552659 0.522451407
27 Roma 2021 0.39518410 NA
28 Roma 2021 0.38390124 NA
29 Roma 2021 0.36605674 0.942102065
30 Roma 2021 0.32014949 0.375689863
如果存在<=25%NA。如果没有我的条件,这很好:
setDT(df)
df_means <- df[,.(Value1_mean = mean(Value1),Value2_mean = mean(Value2)), by = .(Place,Year)]
> df_means
Place Year Value1_mean Value2_mean
1: Copenhagen 2020 NA 0.4257258
2: Copenhagen 2021 0.3581245 NA
3: Berlin 2020 NA 0.3935807
4: Berlin 2021 0.3729461 NA
5: Roma 2019 0.4572996 0.3956536
6: Roma 2020 NA 0.6494491
7: Roma 2021 0.4142637 NA
我没有包含ifelse
语句,这不起作用:
df_means2 <- df[,.(Value1_mean = ifelse(sum(is.na(Value1))/length(Value1)>=0.25,NA,mean(Value1,na.rm=TRUE)),
Value2_mean = ifelse(sum(is.na(Value2))/length(Value2)>=0.25,NA,mean(Value2,na.rm=TRUE))),
by = .(Place,Year)]
我检查了这些帖子1、2和3,但没有解决我的问题。我的预期结果应该是:
> df_means2
Place Year Value1_mean Value2_mean
1 Copenhagen 2020 mean mean
2 Copenhagen 2021 mean <NA>
3 Berlin 2020 mean mean
4 Berlin 2021 mean <NA>
5 Roma 2019 mean mean
6 Roma 2020 mean mean
7 Roma 2021 mean <NA>
如何转换代码?
我们可以使用if/else
library(data.table)
df[, lapply(.SD, function(x) if(mean(is.na(x)) <= 0.25)
mean(x, na.rm = TRUE) else NA_real_), by = .(Place, Year)]
-输出
Place Year Value1 Value2
1: Copenhagen 2020 0.5145925 0.5678063
2: Copenhagen 2021 0.5942537 NA
3: Berlin 2020 0.4783384 0.3437911
4: Berlin 2021 0.5675098 NA
5: Roma 2019 0.6373937 0.6888995
6: Roma 2020 0.3668730 0.6842431
7: Roma 2021 0.3663229 NA
在OP的代码中,使用NA_real_
而不是默认为logical
的NA
,这会在class
中产生冲突
df[,.(Value1_mean = ifelse(sum(is.na(Value1))/length(Value1)>0.25,
NA_real_,mean(Value1,na.rm=TRUE)),
Value2_mean = ifelse(sum(is.na(Value2))/length(Value2)>0.25,
NA_real_,
mean(Value2,na.rm=TRUE))),
by = .(Place,Year)]
-输出
Place Year Value1_mean Value2_mean
1: Copenhagen 2020 0.5145925 0.5678063
2: Copenhagen 2021 0.5942537 NA
3: Berlin 2020 0.4783384 0.3437911
4: Berlin 2021 0.5675098 NA
5: Roma 2019 0.6373937 0.6888995
6: Roma 2020 0.3668730 0.6842431
7: Roma 2021 0.3663229 NA