mydata<-structure(list(lead_create = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("10.11.2017 4:47:26",
"10.11.2017 4:48:26", "10.11.2017 4:49:26"), class = "factor"),
lead_id = c(24799522L, 24799522L, 24799522L, 24799522L, 24799522L,
24799522L, 24799522L, 24799522L, 24799522L, 24799522L, 24799522L,
24799522L, 24799523L, 24799523L, 24799524L, 24799524L, 24799524L,
24799524L), webmaster_identifier = c(430L, 430L, 430L, 430L,
430L, 431L, 431L, 431L, 431L, 431L, 431L, 431L, 430L, 430L,
430L, 430L, 430L, 430L), product = structure(c(2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("gel", "Intoxic"), class = "factor"), lead_country = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = "Indonesia", class = "factor")), .Names = c("lead_create",
"lead_id", "webmaster_identifier", "product", "lead_country"), class = "data.frame", row.names = c(NA,
-18L))
我不知道为什么,但在这个例子中lead_create因素!它是日期变量。
我需要组变量webmaster_identifier,产品,lead_country按分钟计算唯一lead_id的数量。 lead_create的日期格式为 dd-mm-yyyy hh:mm:ss 我需要数据在这样的数据帧中
lead_create lead_id webmaster_identifier product lead_country
1 10.11.2017 4:47 1 430 Intoxic Indonesia
2 10.11.2017 4:47 1 431 Intoxic Indonesia
3 10.11.2017 4:48 1 430 gel Indonesia
4 10.11.2017 4:49 1 430 gel Indonesia
对于 10.11.2017 4:47:00-10.11.2017 4:47:59 的时间段,对于webmaster=430
和product =intoxic
,lead_country=Indonesia
只是一个独特的lead_id。
对于 10.11.2017 4:47:00-10.11.2017 4:47:59 的时间段,对于webmaster=431
和product =intoxic
,lead_country=Indonesia
也只是一个独特的lead_id。
对于 10.11.2017 4:48:00-10.11.2017 4:48:59 的时间段,对于webmaster=430
和product =gel
,lead_country=Indonesia
只是一个独特的lead_id。
对于 10.11.2017 4:49:00-10.11.2017 4:49:59 的时间段,对于webmaster=430
和product =gel
,lead_country=Indonesia
只是一个独特的lead_id。
如何创建这样的数据帧?
看起来我们需要删除"lead_create"中的后缀字符串,然后获取distinct
行
library(dplyr)
library(stringr)
mydata %>%
mutate(lead_create = str_remove(lead_create, ":\d+$")) %>%
distinct %>%
mutate(lead_id = group_indices(., lead_country))
# lead_create lead_id webmaster_identifier product lead_country
#1 10.11.2017 4:47 1 430 Intoxic Indonesia
#2 10.11.2017 4:47 1 431 Intoxic Indonesia
#3 10.11.2017 4:48 1 430 gel Indonesia
#4 10.11.2017 4:49 1 430 gel Indonesia