DT的dput()
输出如下:我想每3天计算一次,从DATE的最小值开始,按ID分组。
structure(list(ID = c("pqr", "abc", "ort", "kkg", "ssc", "ccv",
"xyz", "xyz", "xyz"), DATE = c("2022-06-07", "2022-06-24", "2022-06-02",
"2022-06-01", "2022-06-16", "2022-06-07", "2022-06-11", "2022-06-13", "2022-06-27"
), READING_IN = c(150, 2800, 600, 500, 1395.94, 500, 800, 179, 200
), READING_OUT = c(150, 2800, 600, 500, 1400, 501.4, 371.34,
556.47, 462.75)), class = "data.frame", row.names = c(NA, -9L))
下面是一个不成功的尝试。
DT$DATE = as.Date(DT$DATE, format = "%Y-%m-%d")
DT1 = DT %>%
group_by(ID, group = cut(as.Date(DT$DATE, format = "%Y-%m-%d"), '3 days')) %>%
summarise(date_range = paste(min(DATE), min(DATE) + 2, sep = ' to '),
sum_in = sum(READING_IN),
sum_out = sum(READING_OUT), .groups = 'drop') %>%
select(-group)
结果:
structure(list(ID = c("abc", "ccv", "kkg", "ort", "pqr", "ssc",
"xyz", "xyz", "xyz"), date_range = c("2022-06-24-2022-06-26",
"2022-06-07-2022-06-09", "2022-06-01-2022-06-03", "2022-06-02-2022-06-04",
"2022-06-07-2022-06-09", "2022-06-16-2022-06-18", "2022-06-11-2022-06-13",
"2022-06-13-2022-06-15", "2022-06-27-2022-06-29"), sum_in = c(2800,
500, 500, 600, 150, 1395.94, 800, 179, 200), sum_out = c(2800,
501.4, 500, 600, 150, 1400, 371.34, 556.47, 462.75)), row.names = c(NA,
-9L), class = c("tbl_df", "tbl", "data.frame"))
ID = xyz:
期望输出<表类>ID 日期 READING_IN READING_OUT tbody><<tr>xyz 2022-06-11到2022-06-13 979 927.81 xyz2022-06-27到2022-06-29 200 462.75 表类>
我相信你被group_by()
的一些细节欺骗了:
计算总是在未分组的数据帧上完成。如果要对分组数据进行计算,需要在
group_by()
之前使用单独的mutate()
步骤。
加上额外的mutate()
+group_by()
步骤,其行为似乎如下所述:
library(tibble)
library(dplyr)
DT %>%
mutate(DATE = as.Date(DATE, format = "%Y-%m-%d")) %>%
group_by(ID) %>%
mutate(date_group = cut(DATE, '3 days')) %>%
group_by(ID, date_group) %>%
summarise(date_range = paste(min(DATE), min(DATE) + 2, sep = ' to '),
sum_in = num(sum(READING_IN), digits = 2),
sum_out = num(sum(READING_OUT),digits = 2), .groups = 'drop') %>%
select(-date_group)
#> # A tibble: 8 × 4
#> ID date_range sum_in sum_out
#> <chr> <chr> <num:.2!> <num:.2!>
#> 1 abc 2022-06-24 to 2022-06-26 2800.00 2800.00
#> 2 ccv 2022-06-07 to 2022-06-09 500.00 501.40
#> 3 kkg 2022-06-01 to 2022-06-03 500.00 500.00
#> 4 ort 2022-06-02 to 2022-06-04 600.00 600.00
#> 5 pqr 2022-06-07 to 2022-06-09 150.00 150.00
#> 6 ssc 2022-06-16 to 2022-06-18 1395.94 1400.00
#> 7 xyz 2022-06-11 to 2022-06-13 979.00 927.81
#> 8 xyz 2022-06-27 to 2022-06-29 200.00 462.75
输入:
DT <- structure(list(
ID = c(
"pqr", "abc", "ort", "kkg", "ssc", "ccv", "xyz", "xyz", "xyz"
),
DATE = c(
"2022-06-07", "2022-06-24", "2022-06-02", "2022-06-01", "2022-06-16",
"2022-06-07", "2022-06-11", "2022-06-13", "2022-06-27"
),
READING_IN = c(150, 2800, 600, 500, 1395.94, 500, 800, 179, 200),
READING_OUT = c(150, 2800, 600, 500, 1400, 501.4, 371.34, 556.47, 462.75)
), class = "data.frame", row.names = c(NA, -9L))
创建于2023-01-18与reprex v2.0.2