r中如何以周为单位衡量交货率



我在r中有以下提到的数据

ID    DateTime1            DateTime2            attribute  Type   Status   batch    delivery_date 
P-1   2021-02-01 10:15:20  2021-02-01 10:15:20  New        CCR1   Done     XA1      2021-02-01 12:15:20
P-2   2021-02-01 10:15:20  2021-02-02 15:45:28  New        CCR1   Hold     XA1      
P-3   2021-02-01 10:15:20  2021-02-01 10:15:20  Old        CDR1   Done     RR1      2021-02-01 12:15:20
P-4   2021-02-01 10:15:20  2021-02-01 10:15:20  Old        DYR1   Done     XY1      2021-02-01 12:15:20
P-5   2021-02-01 10:15:20  2021-02-01 19:45:52  New        CDR1   WIP      XA1   

dput(头(df, 10))

structure(list(ID = c("P-1", "P-2", 
"P-3", "P-4", "P-5", "P-6", 
"P-7", "P-8", "P-9", "P-10"
), DateTime1= c("2021-03-01 03:19:03.364", "2021-03-01 03:28:52.871", 
"2021-03-01 03:52:01.734", "2021-03-01 04:11:28.083", "2021-03-01 04:16:36.579", 
"2021-03-01 04:23:07.379", "2021-03-01 04:24:05.035", "2021-03-01 04:26:07.860", 
"2021-03-01 04:29:42.571", "2021-03-01 04:34:37.919"), DateTime2 = c("2021-03-01 05:03:02.510", 
"2021-03-01 06:15:28.954", "2021-03-02 06:28:19.316", "2021-03-04 13:25:45.871", 
"2021-03-02 12:27:07.154", "2021-03-01 05:01:58.497", "2021-03-01 05:12:54.269", 
"2021-03-03 07:39:05.244", "2021-03-01 05:03:49.996", "2021-03-01 06:01:16.509"
), attribute = c("New", "New", "New", "New", "New", 
"New", "New", "New", "New", "New"), TYPE = c("CCR1", 
" CCR1", " CCR1", " CCR1", " CCR1", 
" CCR1", " CDR1", " CDR1", " CCR1", 
" CCR1"), Status = c("Hold", "Done", 
"Done", "Done", "Done", 
"Hold", "Hold", "Done", 
"WIP", "Done"), Batch = c("XA1", 
"XA1", "XA1", "XA1", "XA1", "RR1", "RR1", "XA1", "XY1", "XA1"),
delivery_date = c(NA, "2021-03-01 06:15:29", 
"2021-03-01 03:52:03", "2021-03-01 04:11:29", "2021-03-01 04:16:38", 
NA, NA, "2021-03-01 04:26:09", NA, "2021-03-01 06:01:16")), row.names = c(NA, 
10L), class = "data.frame")

我需要通过使用上面提到的具有以下条件的数据框,以小时为基础推导每个Date的时差。

条件是,DateTime1应用于派生基准日期(即20121-02-01),并且我需要考虑的日期差异仅适用于那些ID,其中Type等于CCR1, Batch等于XA1,每周针对该特定月份。

逻辑是,如果没有delivery_date,Status等于Hold,那么取DateTime2-DateTime1的差值(我们称之为Diff1),如果delivery_date可用,那么取delivery_date-DateTime1的差值(我们称之为Diff2),如果delivery_date不可用,但Status不是Hold,那么在Other中考虑它,并取系统日期时间-DateTime1的差值(我们称之为Diff3)。例如,如果平均小时数为<24,则其他小时数以天数表示。

其中,我们可以根据Diff的值按小时创建一个桶,从<1小时开始,每差1小时,最后一个为9+小时。另外,在Total列中对Diff1Diff2Diff3取平均值

要求输出格式:

Month       Count   Count_1   Per_1  Diff1   Count_2  Per_2   Diff2  Count_3  Per_3    Diff3
Feb-21 (W1) 3       1         33.33% 5.5h    1        33.33%  2h     1        33.33%   32 Days
Feb-21 (W2) 0       0         0.00%  0       0        0.00%   0      0        0.00%    0
Feb-21 (W3) 0       0         0.00%  0       0        0.00%   0      0        0.00%    0
Feb-21 (W4) 0       0         0.00%  0       0        0.00%   0      0        0.00%    0
Total       3       1         33.33% 5.5h    1        33.33%  2h     1        33.33%   32 Days

这是一个解决方案-对于时差,我把它全部留给小时,因为我不知道如何在一列中有小时/天的差异。一旦你有了数据,你可以把它们打印出来,或者单独创建一个虚拟变量,但是对于一致的时差列,它必须以小时或天为单位,而不是混合。

library(dplyr)
library(lubridate)
library(tidyr)
df %>% mutate(
# define the time system as logic provided
time_system = case_when(
Status == "Hold" ~ "Diff_1",
!is.na(delivery_date) ~ "Diff_2",
TRUE ~ "Diff_3" ),
# Base on the time system calculate the time difference according to the rule
calcuclate_diff = case_when(
time_system == "Diff_1" ~ difftime(DateTime2, DateTime1, units = "hours"),
time_system == "Diff_2" ~ difftime(delivery_date, DateTime1, units = "hours"),
time_system == "Diff_3" ~ difftime(Sys.time(), DateTime1, units = "hours")
),
# create the formating date for Month & Week
month = format(as.Date(DateTime1), "%b-%y"),
week = week(DateTime1) - week(floor_date(as.Date(DateTime1), unit = "months")) + 1,
format_date = paste0(month, " (W", week, ")")) %>%
# calculate the variables that needed grouped by month-week
group_by(format_date, time_system) %>%
summarize(Count = n(),
Average = mean(calcuclate_diff),
.groups = "drop") %>%
group_by(format_date) %>%
mutate(Total_count = sum(Count),
Per = Count / sum(Count)) %>%
# pivot the data wide to have the final expecting format
pivot_wider(names_from = time_system, values_from = c(Count, Average, Per),
names_sort = FALSE, names_glue = "{.value}_{time_system}")

输出
# A tibble: 1 x 11
# Groups:   format_date [1]
format_date Total_count Count_Diff_1 Count_Diff_2 Count_Diff_3 Average_Diff_1 Average_Diff_2  Average_Diff_3 Per_Diff_1 Per_Diff_2 Per_Diff_3
<chr>             <int>        <int>        <int>        <int> <drtn>         <drtn>          <drtn>              <dbl>      <dbl>      <dbl>
1 Mar-21 (W1)          10            3            6            1 1.064768 hours 0.7036553 hours 155.1412 hours        0.3        0.6        0.1

最新更新