我正在尝试通过创建一个新列来检查客户是否每周进行购买,该列显示购买发生在随后的一周。
初始数据
id timestamp week_no
b9968 2016-08-17 09:38:33 33
b9968 2016-08-18 17:33:23 33
b9968 2016-08-19 18:25:20 33
b9968 2016-08-23 17:46:44 34
4983f 2016-08-12 12:01:23 32
4983f 2016-08-13 17:30:47 32
最终数据
id timestamp week_no diff1
b9968 2016-08-17 09:38:33 34 1
4983f 2016-08-13 17:30:47 32 0
其中一个选项是为此使用dplyr
。
预期的输出表有点偏差,因为第一个时间戳与week_no不匹配。
library(dplyr)
df %>%
group_by(id) %>%
mutate(diff1 = week_no - lag(week_no)) %>%
filter(timestamp == max(timestamp))
# A tibble: 2 x 4
# Groups: id [2]
id timestamp week_no diff1
<chr> <dttm> <int> <int>
1 b9968 2016-08-23 17:46:44 34 1
2 4983f 2016-08-13 17:30:47 32 0
数据:
df <- structure(list(id = c("b9968", "b9968", "b9968", "b9968", "4983f",
"4983f"),
timestamp = structure(c(1471426713, 1471541603, 1471631120,
1471974404, 1471003283, 1471109447),
tzone = "UTC", class = c("POSIXct","POSIXt")),
week_no = c(33L, 33L, 33L, 34L, 32L, 32L)),
.Names = c("id", "timestamp", "week_no"),
row.names = c(NA, -6L),
class = "data.frame")