按因子水平计算R中的平均值



在r中使用ave()通过因子计算百分位数时,我询问了如何在ave()函数中计算百分位。任务完成后,我面临着一项更艰巨的任务。

获取以下数据:

DistrictName            Building Name   X2.Yr.AVG       Thirty          Seventy
Ionia Public Schools    Emerson         -0.337464323    -0.196387489    -0.046524185
Ionia Public Schools    Jefferson       -0.318673587    -0.196387489    -0.046524185
Ionia Public Schools    Ionia Middle    -0.290854669    -0.196387489    -0.046524185
Ionia Public Schools    Ionia Middle    -0.288202752    -0.196387489    -0.046524185
Ionia Public Schools    Twin Rivers El  -0.23426755     -0.196387489    -0.046524185
Ionia Public Schools    R.B. Boyce El   -0.202319963    -0.196387489    -0.046524185
Ionia Public Schools    Twin Rivers El  -0.142995221    -0.196387489    -0.046524185
Ionia Public Schools    Emerson         -0.141620372    -0.196387489    -0.046524185
Ionia Public Schools    Jefferson       -0.141407078    -0.196387489    -0.046524185
Ionia Public Schools    R.B. Boyce El   -0.115530249    -0.196387489    -0.046524185
Ionia Public Schools    Ionia Middle    -0.111449269    -0.196387489    -0.046524185
Ionia Public Schools    Twin Rivers El  -0.054918339    -0.196387489    -0.046524185
Ionia Public Schools    Jefferson       -0.045591501    -0.196387489    -0.046524185
Ionia Public Schools    A.A. Rather     0.002251298     -0.196387489    -0.046524185
Ionia Public Schools    R.B. Boyce El   0.020669633     -0.196387489    -0.046524185
Ionia Public Schools    Emerson         0.065064968     -0.196387489    -0.046524185
Ionia Public Schools    A.A. Rather     0.182776319     -0.196387489    -0.046524185

我想做的事情类似于Excel中的AVERAGEIF函数。在Excel中,我可以说=AVERAGEIF(C2:C18, "<-.196387489"),它表示-0.278630474的平均值。我在R中需要一些东西,允许我做以下事情:我想为的平均值创建新的变量:1) X2.Yr.AVG的任何小于Thirty的值的值2) 大于Seventy 值的任何值

问题是,我需要能够在因子DistrictName具有722个级别的大数据帧中执行此操作。在计算百分位数的步骤中,我使用ave()函数根据所需因子创建百分位数,如下所示:

    MATHgap$Thirty<-ave(MATHgap$X2.Yr.AVG, MATHgap$DistrictName, 
       FUN= function(x) quantile(x, 0.3))

    MATHgap$Seventy<-ave(MATHgap$X2.Yr.AVG, MATHgap$DistrictName, 
       FUN= function(x) quantile(x, 0.7))

是否有任何方法可以在ave()中执行类似于AVERAGEIF的操作,以便对DistrictName的每个值独立于其他值重复该操作?I.e,Ionia公立学校的X2.Yr.AVG平均值应小于-0.196387489,X2.Yr.AVG平均值应大于-0.046524185,我希望能够使用X2.Yr.AVGThirtySeventy各自的值为所有地区执行相同的功能。

如果这令人困惑,请道歉。

下面是一个使用dplyr:的解决方案

MATHgap %>%
  group_by(DistrictName) %>%
  mutate(MeanLT30 = mean(X2.Yr.AVG[X2.Yr.AVG < Thirty]),
    MeantGT70 = mean(X2.Yr.AVG[X2.Yr.AVG > Seventy]))

最新更新