head(df)
county state_abr population unemp health_ins poverty SNAP no_comp no_internet home_broad broad_num broad_avail broad_cost price_bbn
1 Autauga AL 55869 2.7 7.1 15.4 12.7 NA 20.9 78.9 0 0.0 67.32586 35.00
2 Baldwin AL 223234 2.7 10.2 10.6 7.5 NA 21.3 78.1 0 0.0 67.32586 35.00
3 Barbour AL 24686 3.8 11.2 28.9 27.4 NA 38.9 60.4 4 99.2 74.99000 35.00
4 Bibb AL 22394 3.1 7.9 14.0 12.4 23.7 33.8 66.1 0 0.0 67.32586 35.00
5 Blount AL 57826 2.7 11.0 NA 9.5 21.3 NA 68.5 0 0.0 67.32586 35.00
6 Bullock AL 10101 3.6 10.8 31.4 25.9 27.1 40.1 58.9 1 40.1 57.99000 71.95
Sublette WY 9831 4.4 13.4 8.4 2.2 5.4 17.5 81.7 3 19.5 59.65 Sweetwater WY 42343 3.9 12.0 12.0 5.8 7.7 16.1 82.4 5 95.1 63.30
Teton WY 23464 2.7 10.0 7.1 2.1 4.2 13.6 85.9 6 96.0 69.99
Uinta WY 20226 3.9 12.2 12.5 7.1 6.1 11.5 88.2 5 73.9 63.30
Washakie WY 7805 3.9 15.4 12.4 4.9 12.1 21.5 78.3 5 86.1 64.36
Weston WY 6927 2.9 13.3 17.4 4.7 13.8 26.1 73.3 2 52.0 66.67
我在R中有这个数据帧,我想用数值替换NA值。简单的方法是获得列的平均值,并将其替换为NA,但我希望更精确。
因为我的数据帧被划分为状态,在这种情况下,我只使用WY和AL的子集,我想计算该状态的平均值,并将其相应地应用于NA值。
因此,例如,第1行的no_comp有一个"NA",带有state_abr AL。如果我取no_comp的平均值,它也会包括WY的平均值。这是我不想要的。我只想计算具有state_abr"AL"的no_comp的平均值,并将其应用于相应的NA值。
我们可以按"state_abr"分组,用across
循环mutate
中的数字列,并使用zoo
中的na.aggregate
用mean
值替换NA
。默认情况下,na.aggregate
使用FUN = mean
library(zoo)
library(dplyr)
df1 <- df1 %>%
group_by(state_abr) %>%
mutate(across(where(is.numeric), na.aggregate)) %>%
ungroup
或者如果我们不想使用额外的包装
df1 <- df1 %>%
group_by(state_abr) %>%
mutate(across(where(is.numeric), ~ replace(.x, is.na(.x),
mean(.x, na.rm = TRUE))))
或者你可以使用这个基本的R一行:
df[which(is.na(df$no_comp) == TRUE),]$no_comp <- ave(df$no_comp,df$state_abr, FUN = function(x) mean(x,na.rm = TRUE))[which(is.na(df$no_comp) == TRUE)]
#Data:
county <-c("Autauga","Baldwin","Barbour","Bibb","Blount","Bullock","Sublette","Teton","Uinta","Washakie","Weston")
state_abr <- c(rep("AL",6),rep("WY",5))
population <- c(55869,223234,24686,22394,57826,10101,9831,23464,20226,7805,6927)
unemp <- c(2.7,2.7,3.8,3.1,2.7,3.6,4.4,2.7,3.9,3.9,2.9)
no_comp <- c(NA,NA,NA,23.7,21.3,27.1,5.4,4.2,6.1,12.1,13.8)
df <- data.frame(county,state_abr,population,unemp,no_comp)