根据日期列每3天汇总一次值

  • 本文关键字:一次 日期 3天 r
  • 更新时间 :
  • 英文 :


DT的dput()输出如下:我想每3天计算一次,从DATE的最小值开始,按ID分组。

structure(list(ID = c("pqr", "abc", "ort", "kkg", "ssc", "ccv", 
"xyz", "xyz", "xyz"), DATE = c("2022-06-07", "2022-06-24", "2022-06-02", 
"2022-06-01", "2022-06-16", "2022-06-07", "2022-06-11", "2022-06-13", "2022-06-27"
), READING_IN = c(150, 2800, 600, 500, 1395.94, 500, 800, 179, 200
), READING_OUT = c(150, 2800, 600, 500, 1400, 501.4, 371.34, 
556.47, 462.75)), class = "data.frame", row.names = c(NA, -9L))

下面是一个不成功的尝试。

DT$DATE = as.Date(DT$DATE, format = "%Y-%m-%d")
DT1 = DT %>%
group_by(ID, group = cut(as.Date(DT$DATE, format = "%Y-%m-%d"), '3 days')) %>%
summarise(date_range = paste(min(DATE), min(DATE) + 2, sep = ' to '), 
sum_in =  sum(READING_IN),
sum_out = sum(READING_OUT), .groups = 'drop') %>%
select(-group)

结果:

structure(list(ID = c("abc", "ccv", "kkg", "ort", "pqr", "ssc", 
"xyz", "xyz", "xyz"), date_range = c("2022-06-24-2022-06-26", 
"2022-06-07-2022-06-09", "2022-06-01-2022-06-03", "2022-06-02-2022-06-04", 
"2022-06-07-2022-06-09", "2022-06-16-2022-06-18", "2022-06-11-2022-06-13", 
"2022-06-13-2022-06-15", "2022-06-27-2022-06-29"), sum_in = c(2800, 
500, 500, 600, 150, 1395.94, 800, 179, 200), sum_out = c(2800, 
501.4, 500, 600, 150, 1400, 371.34, 556.47, 462.75)), row.names = c(NA, 
-9L), class = c("tbl_df", "tbl", "data.frame"))

ID = xyz:

期望输出
<表类>ID日期READING_INREADING_OUTtbody><<tr>xyz2022-06-11到2022-06-13979927.81xyz2022-06-27到2022-06-29200462.75

我相信你被group_by()的一些细节欺骗了:

计算总是在未分组的数据帧上完成。如果要对分组数据进行计算,需要在group_by()之前使用单独的mutate()步骤。

加上额外的mutate()+group_by()步骤,其行为似乎如下所述:

library(tibble)  
library(dplyr)
DT %>%
mutate(DATE = as.Date(DATE, format = "%Y-%m-%d")) %>% 
group_by(ID) %>%
mutate(date_group = cut(DATE, '3 days')) %>% 
group_by(ID, date_group) %>%
summarise(date_range = paste(min(DATE), min(DATE) + 2, sep = ' to '), 
sum_in =  num(sum(READING_IN), digits = 2),
sum_out = num(sum(READING_OUT),digits = 2), .groups = 'drop') %>%
select(-date_group)
#> # A tibble: 8 × 4
#>   ID    date_range                  sum_in   sum_out
#>   <chr> <chr>                    <num:.2!> <num:.2!>
#> 1 abc   2022-06-24 to 2022-06-26   2800.00   2800.00
#> 2 ccv   2022-06-07 to 2022-06-09    500.00    501.40
#> 3 kkg   2022-06-01 to 2022-06-03    500.00    500.00
#> 4 ort   2022-06-02 to 2022-06-04    600.00    600.00
#> 5 pqr   2022-06-07 to 2022-06-09    150.00    150.00
#> 6 ssc   2022-06-16 to 2022-06-18   1395.94   1400.00
#> 7 xyz   2022-06-11 to 2022-06-13    979.00    927.81
#> 8 xyz   2022-06-27 to 2022-06-29    200.00    462.75

输入:

DT <- structure(list(
ID = c(
"pqr", "abc", "ort", "kkg", "ssc", "ccv", "xyz", "xyz", "xyz"
),
DATE = c(
"2022-06-07", "2022-06-24", "2022-06-02", "2022-06-01", "2022-06-16",
"2022-06-07", "2022-06-11", "2022-06-13", "2022-06-27"
),
READING_IN = c(150, 2800, 600, 500, 1395.94, 500, 800, 179, 200),
READING_OUT = c(150, 2800, 600, 500, 1400, 501.4, 371.34, 556.47, 462.75)
), class = "data.frame", row.names = c(NA, -9L))

创建于2023-01-18与reprex v2.0.2

最新更新