r语言 - group_by() 并取自上次购买特定品牌以来的周数之差

  • 本文关键字:品牌 周数之 group r语言 by r dplyr
  • 更新时间 :
  • 英文 :


我正在尝试按PANID分组,然后取给定brandID的两个日期之间的差异。我正在尝试计算该人购买特定品牌以来有多少周。

PANID对应于一个人WEEK到数周brandID特定品牌

我想回答的问题是,一个给定的人需要多少周才能再次购买同一品牌。零是他们没有再次购买该品牌等。

只是不知道该怎么做。

df <- structure(list(PANID = c(1100016L, 1100016L, 1100016L, 1100057L, 
1100057L, 1100057L, 1100057L, 1100057L, 1100057L, 1100057L, 1100057L, 
1100180L, 1100180L, 1100321L, 1100404L, 1100404L, 1100545L, 1100560L, 
1100560L, 1101162L, 1101162L, 1101238L, 1101279L, 1101279L, 1101279L, 
1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 
1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 
1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 
1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 
1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 1101758L, 1102012L, 
1102046L, 1102384L, 1102384L, 1103069L, 1103101L, 1103101L, 1103101L, 
1103101L, 1103101L, 1103101L, 1103101L, 1103325L, 1103614L, 1103671L, 
1103671L, 1103895L, 1103895L, 1103895L, 1104182L, 1104182L, 1104182L, 
1104182L, 1104182L, 1104182L, 1104182L, 1104182L, 1104182L, 1104182L, 
1104182L, 1104182L, 1104208L, 1104208L, 1104224L, 1104273L, 1104315L, 
1104315L, 1104315L, 1104315L, 1104364L, 1104372L, 1104372L, 1104406L, 
1104869L, 1104968L, 1105171L, 1105171L, 1105239L, 1105239L, 1105239L, 
1105965L, 1105999L, 1106146L, 1106567L, 1106567L, 1106567L, 1106575L, 
1107235L, 1107235L, 1107235L, 1107235L, 1107235L, 1107235L, 1107235L, 
1107425L, 1107425L, 1107581L, 1107581L, 1107847L, 1107847L, 1107862L, 
1107953L, 1108043L, 1108225L, 1108308L, 1108308L, 1108308L, 1108308L, 
1108308L, 1108308L, 1108308L, 1108456L, 1108456L, 1108456L, 1108456L, 
1108456L, 1108456L, 1108456L, 1108456L, 1108456L, 1108456L, 1108456L, 
1108456L, 1108456L, 1108456L, 1108456L, 1108456L, 1109165L, 1109165L, 
1109306L, 1109306L, 1109306L, 1109306L, 1109306L, 1109306L, 1109306L, 
1109306L, 1109306L, 1109306L, 1109306L, 1109306L, 1109306L, 1109306L, 
1109306L, 1109306L, 1109447L, 1109447L, 1109447L, 1109470L, 1109470L, 
1109470L, 1109470L, 1109470L, 1109470L, 1109470L, 1109488L, 1109595L, 
1109629L, 1109629L, 1109629L, 1109629L, 1109629L, 1109629L, 1109629L, 
1109629L, 1109629L, 1109652L, 1109652L, 1109652L, 1109652L, 1109652L
), WEEK = structure(c(11400, 11533, 11673, 11484, 11491, 11491, 
11519, 11561, 11568, 11582, 11603, 11344, 11351, 11673, 11673, 
11673, 11533, 11344, 11631, 11442, 11652, 11673, 11323, 11323, 
11344, 11344, 11344, 11372, 11372, 11372, 11393, 11400, 11400, 
11414, 11414, 11428, 11428, 11435, 11435, 11449, 11470, 11470, 
11470, 11526, 11526, 11533, 11533, 11547, 11561, 11568, 11568, 
11568, 11589, 11589, 11610, 11610, 11631, 11652, 11421, 11463, 
11449, 11680, 11680, 11659, 11323, 11351, 11379, 11540, 11610, 
11645, 11659, 11645, 11442, 11498, 11540, 11519, 11540, 11596, 
11337, 11407, 11435, 11456, 11470, 11491, 11526, 11561, 11589, 
11596, 11666, 11666, 11519, 11540, 11554, 11414, 11673, 11673, 
11673, 11673, 11673, 11442, 11673, 11673, 11386, 11673, 11540, 
11673, 11484, 11491, 11596, 11680, 11673, 11505, 11323, 11344, 
11554, 11610, 11463, 11463, 11484, 11498, 11498, 11645, 11659, 
11379, 11484, 11533, 11673, 11414, 11428, 11533, 11631, 11680, 
11547, 11344, 11344, 11344, 11344, 11372, 11442, 11561, 11372, 
11379, 11393, 11421, 11442, 11456, 11463, 11498, 11498, 11554, 
11554, 11575, 11582, 11596, 11610, 11673, 11575, 11666, 11428, 
11449, 11463, 11477, 11498, 11512, 11526, 11540, 11554, 11561, 
11575, 11575, 11589, 11624, 11638, 11666, 11463, 11519, 11589, 
11351, 11365, 11400, 11547, 11547, 11617, 11645, 11498, 11547, 
11337, 11344, 11372, 11414, 11463, 11484, 11498, 11505, 11673, 
11337, 11351, 11358, 11365, 11393), class = "Date"), brandID = c(48, 
48, 48, 209, 207, 217, 29, 29, 90, 90, 90, 131, 215, 28, 50, 
208, 136, 216, 207, 48, 48, 28, 29, 28, 29, 28, 140, 29, 28, 
140, 29, 28, 140, 29, 28, 29, 28, 29, 28, 28, 29, 28, 140, 29, 
28, 28, 140, 33, 28, 29, 28, 140, 29, 28, 28, 140, 28, 28, 131, 
140, 29, 209, 214, 29, 29, 29, 29, 153, 153, 153, 153, 29, 53, 
28, 33, 135, 135, 209, 211, 128, 225, 211, 211, 62, 211, 156, 
62, 211, 211, 62, 217, 217, 32, 50, 209, 210, 211, 156, 214, 
48, 48, 48, 50, 173, 138, 140, 207, 217, 214, 127, 69, 29, 100, 
100, 214, 48, 51, 50, 51, 131, 51, 51, 51, 230, 246, 32, 32, 
50, 50, 28, 100, 131, 135, 82, 1, 6, 69, 82, 7, 1, 48, 48, 79, 
48, 48, 29, 29, 29, 48, 79, 48, 29, 173, 29, 29, 29, 173, 99, 
140, 140, 140, 140, 140, 140, 140, 140, 140, 140, 140, 140, 140, 
140, 140, 140, 250, 28, 131, 34, 158, 34, 34, 157, 157, 157, 
48, 50, 29, 29, 29, 29, 29, 29, 29, 29, 29, 99, 99, 99, 99, 99
)), row.names = c(NA, 200L), class = "data.frame")

编辑:

预期输出如下所示

PANID       WEEK brandID  diff
1 1100016 2001-03-19      48  first purchase of this brand
2 1100016 2001-07-30      48  ~20 week difference
3 1100016 2001-12-17      48  ~18 week difference
4 1100057 2001-06-11     209  first purchase
5 1100057 2001-06-18     207  no difference since this person bought a different brand
6 1100057 2001-06-18     217  no difference since this person bought a different brand

我们可以按"PANID"、"brandID"分组,然后应用difftime来获取"WEEK"列中日期的每周差异

library(dplyr)
df %>% 
group_by(PANID, brandID) %>%
mutate(nweeks = c(0, difftime(WEEK[-1], WEEK[-length(WEEK)], unit = "week")))

使用data.table的解决方案:

library(data.table)
setDT(df)
df <- df[order(PANID, WEEK)] # to garantee that the dates are ordered for every client
mWeeks <- df[, .(mean.weeks = mean(c(0, diff(week(WEEK))), na.rm = TRUE)), by = .(PANID, brandID)]
mWeeks[mean.weeks == 0, mean.weeks := NA]
> mWeeks
PANID brandID mean.weeks
1: 1100016      48  13.000000
2: 1100057     209         NA
3: 1100057     207         NA
4: 1100057     217         NA
5: 1100057      29   3.000000
6: 1100057      90   1.666667
7: 1100180     131         NA
8: 1100180     215         NA
...

最新更新