我有一个客户付款日期的列表,我希望查看付款是否有7/14天或每月的模式,通常是!问题在于,还可以有类似价值的中间付款,因此仅查看付款之间的时间并不总是有效。是否有任何简单的方法(使用SQL或R)可以帮助我将客户归类为每周或每月付款人?
似乎是一个非常简单的信号处理问题,但也许我找不到Google的正确词,因为我找不到任何东西。任何指向我正确方向的人都将不胜感激!
示例数据:
CustomerID Payment Date
Customer1 2017-01-05
Customer1 2017-01-06
Customer1 2017-01-12
Customer1 2017-01-17
Customer1 2017-01-19
Customer1 2017-01-19
Customer1 2017-01-26
Customer1 2017-02-02
Customer1 2017-02-03
Customer2 2017-06-04
Customer2 2017-06-06
Customer2 2017-07-04
Customer2 2017-07-06
Customer2 2017-07-22
Customer2 2017-07-28
Customer2 2017-08-06
示例输出
CustomerID Classification
Customer1 Weekly
Customer2 Monthly
编辑:要清楚,数据通常要大得多,并且可能比上面更嘈杂。我只是在寻找找到模式的算法的一般想法,而不是尝试解决我发布的小数据集的问题。
payment_date <-
as.Date(
c(
"2017-01-05",
"2017-01-06",
"2017-01-12",
"2017-01-17",
"2017-01-19",
"2017-01-19",
"2017-01-26",
"2017-02-02",
"2017-02-03",
"2017-06-04",
"2017-06-06",
"2017-07-04",
"2017-07-06",
"2017-07-22",
"2017-07-28",
"2017-08-06"
)
)
df <- data.frame(payment_date,
customer_id = 0)
df$customer_id[1:9] <- 1
df$customer_id[10:16] <- 2
customer_information <- data.frame(customer_id = numeric(),
payment = character())
for (i in 1:length(unique(df$customer_id))) {
delta_t <-
abs(as.numeric(df$payment_date[(df$customer_id == i) &
(!duplicated(df$customer_id))] - df$payment_date[(df$customer_id == i) &
(!duplicated(df$customer_id, fromLast = TRUE))]))
nr_of_payments <- NROW(df[df$customer_id == i,])
days_to_pay <- delta_t / nr_of_payments
if (days_to_pay > 7) {
to_add <- data.frame(customer_id = i,
payment = "monthly")
customer_information <- rbind(customer_information, to_add)
} else{
to_add <- data.frame(customer_id = i,
payment = "weekly")
customer_information <- rbind(customer_information, to_add)
}
}
代码正在使用客户支付付款所需的平均时间。如果平均时间超过7个,他是每月付款人,否则他是每周付款人。
它有效,但我想这不是一个令人满意的解决方案。似乎每月每月有两项付款。如果是这样,您可以考虑获得更准确的结果。