r-按开始日期对最后3个月的数据进行分组和筛选,dplyr



structure(list(Order = c("100003378", "100003378", "100003378", 
"100003378", "100003378", "100003378", "100003378", "100003378", 
"100016566", "100016566"), Op = c(1016, 1017, 1018, 1019, 1020, 
1400, 1500, 9997, 1800, 1850), `Op Desc` = c("SOLDER REWORK IAW 200358984", 
"PP&C REWORK IAW 200358984", "INSPECT IAW 200358984", "QNOTE REVIEW IAW 200358984", 
"WI 1000   Program FPGA / Test CCA", "WI 1400   Vacuum Bake", 
"WI 1500   Quality Inspection", "PP&C Material Movement / Go To Stock", 
"WI1800Test,TempTest,Tune Puck by Sanding", "WI 1850   Bond SAT Wires, As Required"
), `Part No` = c("2355805G1", "2355805G1", "2355805G1", "2355805G1", 
"2355805G1", "2355805G1", "2355805G1", "2355805G1", "2353604G1", 
"2353604G1"), WBS = c("G-CUST-01", "G-CUST-01", "G-CUST-01", 
"G-CUST-01", "G-CUST-01", "G-CUST-01", "G-CUST-01", "G-CUST-01", 
"G-CUST-01", "G-CUST-01"), `Work Cntr` = c("CHRP0000", "CHRP0000", 
"CHRI0000", "CHRP0000", "26502122", "26303014", "26601012", "26801702", 
"26502132", "26203022"), `Actual Start` = structure(c(1576610787.297, 
1578489110.297, 1578493446.18, 1578600321, 1578617121.747, 1578943396.57, 
1580227782.307, 1580417882.567, 1548185774.11, 1580986391.243
), tzone = "UTC", class = c("POSIXct", "POSIXt")), `Actual Comp` = structure(c(1578443159.437, 
1578489164.8, 1578494073.52, 1578600334.077, 1578618039.147, 
1579611732.62, 1580413592.273, 1580417887.177, 1580986384.79, 
1580986425.4), tzone = "UTC", class = c("POSIXct", "POSIXt")), 
Operation_Span = structure(c(21.2080108796308, 0.000630821759502093, 
0.00726087962863622, 0.000151354165540801, 0.0106180555566593, 
7.73537094907352, 2.15057831018611, 5.33564830267871e-05, 
379.636697685186, 0.000395335648898725), class = "difftime", units = "days")), row.names = c(NA, 
-10L), groups = structure(list(Order = c("100003378", "100016566"
), .rows = structure(list(1:8, 9:10), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = 1:2, class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

我正在尝试使用这个,但它不起作用

df_Exciter <- df1 %>% 
filter(`Actual Start` > "2020-06-01" & Date <"2020-12-01")%>%
group_by(`Part No`,`Order`) %>%
arrange(Order,`Actual Start`)

有没有一种方法可以创建一个函数,指定我能看到的最后X个月的数据?我是R的新手,我正在努力了解如何使用dplyr

如果Actual Start是两个日期过滤器的预定列,这应该适用于您。

df1 %>% 
filter(`Actual Start` > "2020-06-01") %>%
filter(`Actual Start` < "2020-12-01") %>%
group_by(`Part No`,`Order`) %>%
arrange(Order,`Actual Start`)

但是,您提供的示例数据在筛选的时间段内没有列。

几件事:

  1. 我总是建议先使用janitor库中的clean_names——这会使列名更容易使用,因为它们不包含空格。

  2. Date在您提供的数据中不存在(您尝试对"实际开始时间"one_answers"日期"使用筛选器-我在下面假设您指的是第二个列的Actual Comp(。

  3. 您当前对列进行子集设置的方式不会保留任何行——您的筛选器参数是Actual Start > 2020-06-01,没有一行满足它。为了便于说明,我使用了下面的2019-06-01

  4. 你不必在这里使用group_by(除非你在这之后需要它(;您可以简单地使用arrange命令中的所有参数。

  5. 如果您想了解有关每个tidyverse命令的更多信息,请考虑安装tidylog包,并在tidylog::(例如tidylog::filter()(之前使用每个函数

这会给你:

df_Exciter <- df1 %>% 
filter(`Actual Start` > "2019-06-01" & `Actual Comp` <"2020-12-01") %>% 
arrange(`Part No`,Order,`Actual Start`)

或者包括我的建议:

library(janitor)
df_Exciter <- df1 %>% 
clean_names() %>% 
tidylog::filter(actual_start > "2019-06-01" & actual_comp <"2020-12-01") %>% 
arrange(part_no,order,actual_start)

最新更新