我使用的是PostgreSQL 9.3.9,我有一个名为list_all_upsells的过程,该过程在月初和月末进行。(见sqlfiddle.com/# !例如,下面的代码将列出10月份的追加销售账户数量:
select COUNT(up.*) as "Total Upsell Accounts in October" from
list_all_upsells('2015-10-01 00:00:00'::timestamp, '2015-10-31 23:59:59'::timestamp) as up
where up.user_id not in
(select distinct user_id from paid_users_no_more
where concat(extract(month from payment_stop_date),'-',extract(year from payment_stop_date))<>
concat(extract(month from payment_start_date),'-',extract(year from payment_start_date)));
list_all_upsells过程如下:
DECLARE
payor_email_2 text;
BEGIN
FOR payor_email_2 in select distinct payor_email from paid_users LOOP
return query
execute
'select paid_users.* from paid_users,
(
select payment_start_date as first_time from paid_users
where payor_email = $3
order by payment_start_date limit 1
) as dummy
where payor_email = $3
and payment_start_date > first_time
and payment_start_date between $1 and $2
and first_time < $1'
using a, b, payor_email_2;
END LOOP;
return;
END
我希望能够运行这个所有月份,我们有记录和查询数据在一个表中,像这样:
Month | Total Upselled Accounts
---------------------------------
08/2014 | 23
09/2014 | 35
ETC...
10/2015 | 56
我有一个查询来获取每个月的第一个月和每个月的最后一个月,我们已经开展业务:
select distinct date_trunc('month', payment_start_date)::date as startmonth
from paid_users ORDER BY startmonth;
最后一个月:
SELECT distinct (date_trunc('MONTH', payment_start_date) +
INTERVAL '1 MONTH - 1 day')::date as endmonth from paid_users
ORDER BY endmonth;
现在我该如何创建一个函数来遍历list_all_upsells
并获取每个月的计数?例如,对startmonth
的第一个查询给出了2014-03-01,2014-04-01,…到2015-10-01,而第二次查询endmonth
给我2014-03-31,2014-04-30,…到2015-10-31。我想在每个月都运行list_all_sells
,这样我就可以得到每个月我们有多少个追加销售账户的汇总计数
我的paid_users
表是这样的:
CREATE TABLE paid_users
(
user_id integer,
user_email character varying(255),
payor_id integer,
payor_email character varying(255),
payment_start_date timestamp without time zone DEFAULT now()
)
paid_users_no_more
:
CREATE TABLE paid_users_no_more
(
user_id integer,
payment_stop_date timestamp without time zone DEFAULT now()
)
您的函数有几个问题,所以让我们从这里开始。它的不足之处在于(1)您只需要一个参数来表示月份,使用月初和月末是在为自己设置问题;(2)你不需要动态查询,因为你不需要改变标识符(表名或列名);(3)不需要循环;(4)你的逻辑是错误的。我还可以提到PostgreSQL使用函数,并且它们都以CREATE FUNCTION list_all_upsells(...)
这样的行开头,但这太挑剔了。
从逻辑开始:显然,通过他的电子邮件地址识别的用户从某个payment_start_date
提取订阅,直到某个payment_stop_date
,并且可以多次执行此操作。您要查找的是那些在相关月份之前进行了首次订阅的用户,以及在相关月份开始了新订阅但不是首次订阅的用户。在这种情况下,过滤器payment_start_date > first_time
是无用的,因为您已经过滤了有关月份之前的首次订阅(first_time < $1
)和新订阅(payment_start_date BETWEEN $1 AND $2
)。
点(1)、(2)和(3)只有在函数内部重写查询时才会变得明显:
CREATE FUNCTION list_all_upsells(timestamp) RETURNS SETOF paid_users AS $$
SELECT paid_users.*
FROM paid_users
JOIN ( -- This JOIN keeps only those rows where the payor_email has a prior subscription
SELECT DISTINCT payor_email,
first_value(payment_start_date) OVER (PARTITION BY payor_email ORDER BY payment_start_date) AS dummy
FROM paid_users
WHERE payment_start_date < date_trunc('month', $1)
) dummy USING (payor_email)
-- This filter keeps only those rows with new subscriptions in the month
WHERE date_trunc('month', payment_start_date) = date_trunc('month', $1)
$$ LANGUAGE sql STRICT;
由于函数体简化为单个SQL语句,因此该函数现在是sql
语言函数,这比plpgsql
更有效。您现在只提供一个参数,它可以是您希望获得数据的月份中的任何时刻,因此list_all_upsells(LOCALTIMESTAMP)
将为您提供当前月份的结果。就你发布的查询而言,它将是:
SELECT count(up.*) AS "Total Upsell Accounts in October"
FROM list_all_upsells(LOCALTIMESTAMP) up
WHERE up.user_id NOT IN
(SELECT DISTINCT user_id FROM paid_users_no_more
WHERE date_trunc('month', payment_stop_date) <>
date_trunc('month', up.payment_start_date)
);
顺便说一句,这确实回避了为什么要使用表paid_users_no_more
的问题。为什么不简单地将列payment_stop_date
添加到表paid_users
呢?如果该列为NULL
,则用户仍然订阅。但是整个查询相当奇怪,因为list_all_upsells()
在当月返回新订阅,那么为什么要在的其他时间取消订阅呢?
现在回到你真正的问题:
SELECT months.m "Month", coalesce(count(up.*), 0) "Total Upselled Accounts"
FROM generate_series('2014-08-01'::timestamp,
date_trunc('month', LOCALTIMESTAMP),
'1 month') AS months(m)
LEFT JOIN list_all_upsells(months.m) AS up ON date_trunc('month', payment_start_date) = m
GROUP BY 1
ORDER BY 1;
生成从某个起始月份到当前月份的一系列月份,然后计算每个月的新订阅数,可能为0。
SQLFiddle