我有一个订阅表,其中包含用户ID、订阅开始日期和预订结束日期。我还有一个带有日期戳字段的日历表,即从订阅表中的第一个订阅日期开始的每个日期。
我正试图写一些东西,给我一个带有日期列和三个数字的表格:(当天(总活跃用户数、新订阅用户数和取消订阅用户数。
(注意:我试图使用建议的GitHub Flavored Markdown插入示例表,但它只是放在一行中。(
目前,我正在玩一个查询,它在两个表之间创建多个联接,每个数字一个:
select a.datestamp
,count(distinct case when b_sub.UserID is not null then b_sub.UserID end) as total_w_subscription
,count(distinct case when b_in.UserID is not null then b_in.UserID end) as total_subscribed
,count(distinct case when b_out.UserID is not null then b_out.UserID end) as total_unsubscribed
from Calendar as a
left join Subscription as b_sub -- all those with subscription on given date
on b_sub.sub_dt <= a.datestamp
and (b_sub.unsub_dt > a.datestamp or b_sub.unsub_dt is null)
left join Subscription as b_in -- all those that subscribed on given date
on b_in.sub_dt = a.datestamp
left join Subscription as b_out -- all those that unsubscribed on given date
on b_out.unsub_dt = a.datestamp
where a.datestamp > '2021-06-10'
group by a.datestamp
order by datestamp asc
;
我已为两个表中的日期字段编制了索引。如果我只看一天,它会在3秒内运行。两天已经花了很长时间。该子表有超过260万张记录,理想情况下,我需要在2012年的某个时候开始我的时间表。
最节省时间的方法是什么?
你走在了正确的轨道上。我创建了一些表变量,并假设每个订阅都包含开始和结束日期的数据结构。
--Create @dates table variable for calendar
DECLARE @startDate DATETIME = '2018-01-01'
DECLARE @endDate DATETIME = '2021-06-18'
DECLARE @dates TABLE
(
reportingdate DATETIME
)
WHILE @startDate <= @endDate
BEGIN
INSERT INTO @dates SELECT @startDate
SET @startDate += 1
END
--Create @subscriptions table variable for subcriptions to join onto calendar
DECLARE @subscriptions TABLE
(
id INT
,startDate DATETIME
,endDate DATETIME
)
INSERT INTO @subscriptions
VALUES
(1,'2018-01-01 00:00:00.000','2019-10-07 00:00:00.000')
,(2,'2018-01-11 00:00:00.000','2019-12-21 00:00:00.000')
,(3,'2019-04-21 00:00:00.000','2020-03-19 00:00:00.000')
,(4,'2019-12-09 00:00:00.000','2020-05-14 00:00:00.000')
,(5,'2020-04-26 00:00:00.000','2020-07-06 00:00:00.000')
,(6,'2020-05-02 00:00:00.000',NULL)
,(7,'2020-08-31 00:00:00.000','2020-10-29 00:00:00.000')
,(8,'2020-12-13 00:00:00.000','2021-01-13 00:00:00.000')
,(9,'2021-02-12 00:00:00.000','2021-04-19 00:00:00.000')
,(10,'2021-06-10 00:00:00.000',NULL)
;
然后我将订阅加入到日历表中。
--CTE to join subscription onto calendar and use ROW_NUMBER functions
WITH cte AS (
SELECT
s.id AS SubID
,d.ReportingDate
,ROW_NUMBER() OVER (PARTITION BY s.id ORDER BY d.ReportingDate) AS asc_rn --used to identify 1st
,ROW_NUMBER() OVER (PARTITION BY s.id ORDER BY d.ReportingDate DESC) AS desc_rn --used to identify last
,CASE WHEN s.endDate IS NULL THEN 1 ELSE 0 END AS ActiveSub
FROM @subscriptions s
LEFT JOIN @dates d ON
d.reportingdate BETWEEN s.startDate AND ISNULL(s.endDate,'9999-12-31')
)
我使用ROW_NUMBER来识别订阅的第一个和最后一个日期行,并检查订阅endDate是否为NULL(仍处于活动状态(。然后,我查询CTE,以计数按天分组的订阅,以及按天分组对新订阅和终止订阅求和。
--Query CTE using asc_rn, desc_rn, and ActiveSub to identify new subscribers and unsubscribers.
SELECT
ReportingDate
,COUNT(*) AS TotalSubscribers
,SUM(CASE WHEN asc_rn = 1 THEN 1 ELSE 0 END) AS NewSubscribers
,SUM(CASE WHEN desc_rn = 1 AND ActiveSub = 0 THEN 1 ELSE 0 END) AS UnSubscribers
FROM cte
GROUP BY ReportingDate
ORDER BY ReportingDate