r-如何分别计算不同站点的数据字段的平均值



我正试图根据小时来计算RAIN的平均值。数据包括1000多个站点24小时内记录的降雨量。每个小时有4段录音,但有些地方会有1、2或3段。我必须平均每个站点每小时的降雨量。示例数据如下:

STN,     HOBLINAME,   LATI,      LONG_,    RAINDATE, HOUR,  RAIN
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  0,    3.5
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  0,    3
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  0,    3
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  0,    2.5
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  1,    0
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  1,    1
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  1,    2
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  2,    0
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  2,    0
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  2,    0
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  2,    0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  0,   7.5
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  1,   7
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  1,   6.5
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  2,   6
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  2,   6
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  2,   5.5
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  2,   5
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  21,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  21,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  21,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  21,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  22,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  22,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  22,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  22,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  23,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  23,   2
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  23,   2.5
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  23,   3

我尝试过:

copy14   <- read.csv("/home/14copy.csv")
aggregate( RAIN ~ HOUR, copy14, FUN = mean )

但它并没有给出所有站点的所有特定小时的平均值(比如所有站点的0小时的平均数(。我想要的是每个站点每小时的平均值,即对于站点4471,RAIN必须单独平均,对于站点804必须单独平均。最后,我应该如何写这个最终的平均值及其所有相关字段。

使用data.table:

require(data.table); setDT(copy14)
copy14[, .(MeanRain = mean(RAIN)), .(STN, HOUR)]

为了继续您的第一次尝试使用聚合,我给出了这个解决方案。aggregateby参数中请求一个列表或数据帧,然后将其应用于给定的数据。在我看来,分组加总结是一个更平滑的解决方案。然而,这个解决方案也应该在这里展示。

library(dplyr)

copy14 <- read.csv("R/rain.csv")
data <- copy14 %>%
aggregate(by = copy14 %>%
select(STN, HOUR),
FUN=mean)

使用dplyr库,我们简单地分组并总结如下:

library(dplyr)
copy14 <- read.csv("rain.csv")
copy14 %>%
group_by(HOUR, STN) %>%
summarise(RAIN = mean(RAIN))

最新更新