我想在df中计算每5行的平均值。这是我的df:
<表类>
时间价值 tbody><<tr>03/06/2021 06:15:00 NA 03/06/2021 06:16:00 NA 03/06/2021 06:17:00 20 03/06/2021 06:18:00 22 03/06/2021 06:19:00 25 03/06/2021 06:20:00 NA 03/06/2021 06:21:00 31日 03/06/2021 06:22:00 23 03/06/2021 06:23:00 19 03/06/2021 06:24:00 25 03/06/2021 06:25:00 34 03/06/2021 06:26:00 42 03/06/2021 06:27:00 NA 03/06/2021 06:28:00 19 03/06/2021 06:29:00 17 03/06/2021 06:30:00 25 表类>
你可以通过引入一个虚拟变量来解决这个问题,该变量将你的观察结果分成五组,然后计算组内的平均值。下面是基于整理宇宙的MWE,它假设您的数据位于名为df
的数据帧中。
library(tidyverse)
df %>%
mutate(Group= 1 + floor((row_number()-1) / 5)) %>%
group_by(Group) %>%
summarise(Mean=mean(value, na.rm=TRUE), .groups="drop")
# A tibble: 4 × 2
Group Mean
<dbl> <dbl>
1 1 22.3
2 2 24.5
3 3 28
4 4 25
基于purrr::map_dfr
的解决方案:
library(purrr)
df <- data.frame(
stringsAsFactors = FALSE,
time = c("03/06/2021 06:15:00","03/06/2021 06:16:00",
"03/06/2021 06:17:00",
"03/06/2021 06:18:00","03/06/2021 06:19:00",
"03/06/2021 06:20:00","03/06/2021 06:21:00",
"03/06/2021 06:22:00","03/06/2021 06:23:00",
"03/06/2021 06:24:00","03/06/2021 06:25:00",
"03/06/2021 06:26:00",
"03/06/2021 06:27:00","03/06/2021 06:28:00",
"03/06/2021 06:29:00","03/06/2021 06:30:00"),
value = c(NA,NA,20L,22L,
25L,NA,31L,23L,19L,25L,34L,42L,NA,19L,17L,
25L)
)
map_dfr(1:(nrow(df)-5),
~ data.frame(Group =.x, Mean = mean(df$value[.x:(.x+5)],na.rm=T)))
#> Group Mean
#> 1 1 22.33333
#> 2 2 24.50000
#> 3 3 24.20000
#> 4 4 24.00000
#> 5 5 24.60000
#> 6 6 26.40000
#> 7 7 29.00000
#> 8 8 28.60000
#> 9 9 27.80000
#> 10 10 27.40000
#> 11 11 27.40000
如果您想取每5分钟的平均值,您可以使用lubridate
的floor_date
/ceiling_date
功能来计算时间。
library(dplyr)
library(lubridate)
df %>%
mutate(time = mdy_hms(time),
time = floor_date(time, '5 mins')) %>%
group_by(time) %>%
summarise(value = mean(value, na.rm = TRUE))
# time value
# <dttm> <dbl>
#1 2021-03-06 06:15:00 22.3
#2 2021-03-06 06:20:00 24.5
#3 2021-03-06 06:25:00 28
#4 2021-03-06 06:30:00 25