我有一个服务的数据帧。现在我需要添加一列";订单;并用以下规则对它们进行分组:
将服务分组到订单:如果在接下来的5个值内一个服务值"0";A";是另一种服务";A";目前,将所有值填充到订单ID中,也可以填充没有服务值的值。如果在接下来的5个值中没有服务值,则定义下一个订单组。
dput(数据(
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
14, 15, 16), time = structure(1:15, .Label = c("13:20:01", "13:20:02",
"13:20:03", "13:20:04", "13:20:05", "13:20:06", "13:20:07", "13:20:08",
"13:20:09", "13:20:10", "13:20:11", "13:20:12", "13:20:13", "13:20:14",
"13:20:15"), class = "factor"), apples = c(2, 2, 2, 3, 3, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2), service = structure(c(NA, 1L, 1L,
NA, 1L, NA, 1L, NA, NA, NA, NA, NA, 1L, NA, 1L), .Label = "A", class = "factor")), class = "data.frame", row.names = c(NA,
-15L))
概述
id time apples service
1 13:20:01 2
2 13:20:02 2 A
3 13:20:03 2 A
4 13:20:04 3
5 13:20:05 3 A
6 13:20:06 2
7 13:20:07 2 A
8 13:20:08 2
9 13:20:09 2
10 13:20:10 2
11 13:20:11 2
12 13:20:12 2
14 13:20:13 2 A
15 13:20:14 2
16 13:20:15 2 A
这就是我要找的格式。ID 2到ID 8是订单,ID 14到ID 16。
id time apples service Order
1 13:20:01 2
2 13:20:02 2 A 1
3 13:20:03 2 A 1
4 13:20:04 3 1
5 13:20:05 3 A 1
6 13:20:06 2 1
7 13:20:07 2 A 1
8 13:20:08 2
9 13:20:09 2
10 13:20:10 2
11 13:20:11 2
12 13:20:12 2
14 13:20:13 2 A 2
15 13:20:14 2 2
16 13:20:15 2 A 2
我用for循环试过了。我建议有一种方法可以使用突变方法;范围";conditon。
谢谢你的帮助!
这是我的输出,由tspano 的代码产生
# A tibble: 15 x 11
id time apples service start end g0 g1 g2 g3 order
<dbl> <fct> <dbl> <fct> <dbl> <dbl> <chr> <int> <chr> <int> <int>
1 1 13:20:01 2 NA 0 3 NA 0 NA 0 NA
2 2 13:20:02 2 A 1 3 start 1 NA 0 NA
3 3 13:20:03 2 A 2 3 NA 1 NA 0 NA
4 4 13:20:04 3 NA 2 2 NA 1 NA 0 NA
5 5 13:20:05 3 A 3 2 NA 1 NA 0 NA
6 6 13:20:06 2 NA 3 1 NA 1 NA 0 NA
7 7 13:20:07 2 A 3 1 NA 1 NA 0 NA
8 8 13:20:08 2 NA 2 0 end 2 NA 0 NA
9 9 13:20:09 2 NA 2 1 NA 2 NA 0 NA
10 10 13:20:10 2 NA 1 1 NA 2 NA 0 NA
11 11 13:20:11 2 NA 1 2 NA 2 NA 0 NA
12 12 13:20:12 2 NA 0 2 NA 2 NA 0 NA
13 14 13:20:13 2 A 1 2 start 3 NA 0 NA
14 15 13:20:14 2 NA 1 1 NA 3 NA 0 NA
15 16 13:20:15 2 A 2 1 NA 3 NA 0 NA
这里有一个使用RcppRoll
的解决方案,它应该比R for loop:更快
data %>%
mutate(start = RcppRoll::roll_sum(c(rep(F,4),(service=="A") %in% T), n = 5, align = "right"),
end = RcppRoll::roll_sum(c((service=="A") %in% T, rep(F,4)), n = 5, align = "left"),
g0 = case_when(start>0 & (lag(start)==0) %in% c(T,NA) ~ "start",
end ==0 ~ "end",
T ~ NA_character_)
) %>%
group_by(g1 = cumsum(!is.na(g0))) %>%
mutate(g2 = if_else(first(g0)=="end", NA_character_, "order")) %>%
ungroup() %>%
group_by(g3 = cumsum(!is.na(g2) & is.na(lag(g2))) ) %>%
mutate(order = if_else(is.na(g2), NA_integer_, g3)) %>%
ungroup() %>%
select(id, time, apples, service, order)
如果你去掉最后一个select
,你可以看到我有几个中间结果,应该会让逻辑变得清晰。