我正在查询以获取每天的 uid 的累积不同计数。
示例:假设有 2 个 uid (100,200( 出现在日期 2016-11-01,它们也出现在第二天,新的 uid 300 (100,200,300( 出现在 2016-11-02在这一点上,我希望商店累积计数为 3 而不是 5(用户 ID 100 和 200 已经出现在过去一天(。
Input table:
date uid
2016-11-01 100
2016-11-01 200
2016-11-01 300
2016-11-01 400
2016-11-02 100
2016-11-02 200
2016-11-03 300
2016-11-03 400
2016-11-03 500
2016-11-03 600
2016-11-04 700
Expected query result:
date daily_cumulative_count
2016-11-01 4
2016-11-02 4
2016-11-03 6
2016-11-04 7
到目前为止,我能够每天获得累积的不同计数,但它也包括前一天以前的不同 uid。
SELECT
date,
SUM(count) OVER (
ORDER BY date ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
FROM (
SELECT
date,
COUNT(DISTINCT uid) AS count
FROM sample_table
GROUP by 1
)ORDER BY date DESC;
任何形式的帮助将不胜感激。
最简单的方法:
SELECT *, count(*) over (order by fst_date ) cum_uids
FROM (
SELECT uid, min(date) fst_date FROM t GROUP BY uid
) t
或类似的东西
WITH firstseen AS (
SELECT uid, MIN(date) date
FROM sample_table
GROUP BY 1
)
SELECT DISTINCT date, COUNT(uid) OVER (ORDER BY date) daily_cumulative_count
FROM firstseen
ORDER BY 1
使用SELECT DISTINCT
因为(date, COUNT(uid))
会重复很多次。
说明:对于每个日期dt
,它计算从最早日期到dt
的uid,因为我们指定ORDER BY date
并且默认为BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
。
exists
来检查之前的任何日期是否存在 ID。然后获取运行总和并找到每个组的最大值,这将获得每日不同的累积计数。
select dt, max(col) as daily_cumulative_count
from (select t1.*,
sum(case when not exists (select 1 from t where t1.dt > dt and id = t1.uid) then 1 else 0 end) over(order by dt) col
from t t1) x
group by dt
以下操作...
SELECT date AS date
COUNT( uid ) AS daily_cumulative_count
FROM ( SELECT leftTable.date AS date,
rightTable.uid AS uid
FROM sample_table AS leftTable
JOIN sample_table AS rightTable ON leftTable.date >= rightTable.date
GROUP BY leftTable.date,
rightTable.uid
) AS allUIDSForDateFinder
GROUP BY date;
此语句首先将一个 sample_table
实例连接到另一个实例,以便 leftTable
中的每条记录都与具有更早或相等date
值的rightTable
的每个记录的副本相关联。 这有效地将列表附加到该date
值之前发生的所有uid
值的每个date
。
通过使用GROUP BY
,生成的数据集被细化为独特的date
和uid
组合。
然后,子查询allUIDSForDateFinder
中的优化数据集按查询主体的date
分组,并执行与每个组关联的uid
值COUNT()
。
如果您有任何问题或意见,请随时发表相应的评论。
一个类似于@stepan-blr的版本,但最终结果是你正在寻找的
版本与:
WITH t as (
SELECT uid
, min(dt) fst_date
FROM input_table
GROUP BY uid
)
SELECT DISTINCT fst_date
, count(uid) over (order by fst_date ) daily_cumulative_count
FROM t
版本 从"选择"中选择:
SELECT DISTINCT fst_date
, count(uid) over (order by fst_date ) daily_cumulative_count
FROM (
SELECT uid
, min(dt) fst_date
FROM input_table
GROUP BY uid
) t
这是一个简单快捷的解决方案,可用于任何SQL版本:
CREATE TABLE MyTable
(
fecha VARCHAR(512),
uid INT
);
INSERT INTO MyTable (fecha, uid ) VALUES ('1/11/2016', '100');
INSERT INTO MyTable (fecha, uid ) VALUES ('1/11/2016', '200');
INSERT INTO MyTable (fecha, uid ) VALUES ('1/11/2016', '300');
INSERT INTO MyTable (fecha, uid ) VALUES ('1/11/2016', '400');
INSERT INTO MyTable (fecha, uid ) VALUES ('2/11/2016', '100');
INSERT INTO MyTable (fecha, uid ) VALUES ('2/11/2016', '200');
INSERT INTO MyTable (fecha, uid ) VALUES ('3/11/2016', '300');
INSERT INTO MyTable (fecha, uid ) VALUES ('3/11/2016', '400');
INSERT INTO MyTable (fecha, uid ) VALUES ('3/11/2016', '500');
INSERT INTO MyTable (fecha, uid ) VALUES ('3/11/2016', '600');
INSERT INTO MyTable (fecha, uid ) VALUES ('4/11/2016', '700');
INSERT INTO MyTable (fecha, uid ) VALUES ('5/11/2016', '700');
INSERT INTO MyTable (fecha, uid ) VALUES ('6/11/2016', '700');
INSERT INTO MyTable (fecha, uid ) VALUES ('7/11/2016', '700');
INSERT INTO MyTable (fecha, uid ) VALUES ('8/11/2016', '700');
INSERT INTO MyTable (fecha, uid ) VALUES ('8/11/2016', '900');
解决方案:
SELECT
t1.fecha,
COUNT(DISTINCT t2.uid) as daily_cumulative_count
FROM
MyTable t1
INNER JOIN
MyTable t2
ON
t1.fecha >= t2.fecha
GROUP BY
t1.fecha
ORDER BY
t1.fecha
您可以在此处快速测试此内容