累积非重复计数

  • 本文关键字: sql presto
  • 更新时间 :
  • 英文 :


我正在查询以获取每天的 uid 的累积不同计数。

示例:假设有 2 个 uid (100,200( 出现在日期 2016-11-01,它们

也出现在第二天,新的 uid 300 (100,200,300( 出现在 2016-11-02在这一点上,我希望商店累积计数为 3 而不是 5(用户 ID 100 和 200 已经出现在过去一天(。

Input table:
    date            uid         
2016-11-01          100
2016-11-01          200
2016-11-01          300
2016-11-01          400         
2016-11-02          100
2016-11-02          200                 
2016-11-03          300
2016-11-03          400
2016-11-03          500
2016-11-03          600
2016-11-04          700
Expected query result:
date            daily_cumulative_count
2016-11-01              4   
2016-11-02              4
2016-11-03              6
2016-11-04              7

到目前为止,我能够每天获得累积的不同计数,但它也包括前一天以前的不同 uid。

SELECT 
  date, 
  SUM(count) OVER (
    ORDER BY date ASC 
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  )
FROM (
  SELECT 
    date, 
    COUNT(DISTINCT uid) AS count
  FROM sample_table
  GROUP by 1
)ORDER BY date DESC;

任何形式的帮助将不胜感激。

最简单的方法:

SELECT *, count(*) over (order by fst_date ) cum_uids
  FROM (
SELECT uid, min(date) fst_date FROM t GROUP BY uid
 ) t

或类似的东西

WITH firstseen AS (
  SELECT uid, MIN(date) date
  FROM sample_table
  GROUP BY 1
)
SELECT DISTINCT date, COUNT(uid) OVER (ORDER BY date) daily_cumulative_count 
FROM firstseen
ORDER BY 1

使用SELECT DISTINCT因为(date, COUNT(uid))会重复很多次。

说明:对于每个日期dt,它计算从最早日期到dt的uid,因为我们指定ORDER BY date并且默认为BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

您可以使用

exists来检查之前的任何日期是否存在 ID。然后获取运行总和并找到每个组的最大值,这将获得每日不同的累积计数。

select dt, max(col) as daily_cumulative_count
from (select t1.*, 
      sum(case when not exists (select 1 from t where t1.dt > dt and id = t1.uid) then 1 else 0 end) over(order by dt) col
      from t t1) x 
group by dt
请尝试

以下操作...

SELECT date AS date
       COUNT( uid ) AS daily_cumulative_count
FROM ( SELECT leftTable.date AS date,
              rightTable.uid AS uid
       FROM sample_table AS leftTable
       JOIN sample_table AS rightTable ON leftTable.date >= rightTable.date
       GROUP BY leftTable.date,
                rightTable.uid
     ) AS allUIDSForDateFinder
GROUP BY date;

此语句首先将一个 sample_table 实例连接到另一个实例,以便 leftTable 中的每条记录都与具有更早或相等date值的rightTable的每个记录的副本相关联。 这有效地将列表附加到该date值之前发生的所有uid值的每个date

通过使用GROUP BY,生成的数据集被细化为独特的dateuid组合。

然后,子查询allUIDSForDateFinder中的优化数据集按查询主体的date分组,并执行与每个组关联的uidCOUNT()

如果您有任何问题或意见,请随时发表相应的评论。

一个类似于@stepan-blr的版本,但最终结果是你正在寻找的

版本与:

WITH t as (
    SELECT uid
           , min(dt) fst_date 
    FROM input_table 
    GROUP BY uid
)
SELECT DISTINCT fst_date
                , count(uid) over (order by fst_date ) daily_cumulative_count
FROM t

版本 从"选择"中选择:

SELECT DISTINCT fst_date
                , count(uid) over (order by fst_date ) daily_cumulative_count
FROM (
    SELECT uid
           , min(dt) fst_date 
    FROM input_table 
    GROUP BY uid
     ) t

这是一个简单快捷的解决方案,可用于任何SQL版本:

CREATE TABLE MyTable
(
    fecha   VARCHAR(512),
    uid     INT
);
INSERT INTO MyTable (fecha, uid ) VALUES ('1/11/2016', '100');
INSERT INTO MyTable (fecha, uid ) VALUES ('1/11/2016', '200');
INSERT INTO MyTable (fecha, uid ) VALUES ('1/11/2016', '300');
INSERT INTO MyTable (fecha, uid ) VALUES ('1/11/2016', '400');
INSERT INTO MyTable (fecha, uid ) VALUES ('2/11/2016', '100');
INSERT INTO MyTable (fecha, uid ) VALUES ('2/11/2016', '200');
INSERT INTO MyTable (fecha, uid ) VALUES ('3/11/2016', '300');
INSERT INTO MyTable (fecha, uid ) VALUES ('3/11/2016', '400');
INSERT INTO MyTable (fecha, uid ) VALUES ('3/11/2016', '500');
INSERT INTO MyTable (fecha, uid ) VALUES ('3/11/2016', '600');
INSERT INTO MyTable (fecha, uid ) VALUES ('4/11/2016', '700');
INSERT INTO MyTable (fecha, uid ) VALUES ('5/11/2016', '700');
INSERT INTO MyTable (fecha, uid ) VALUES ('6/11/2016', '700');
INSERT INTO MyTable (fecha, uid ) VALUES ('7/11/2016', '700');
INSERT INTO MyTable (fecha, uid ) VALUES ('8/11/2016', '700');
INSERT INTO MyTable (fecha, uid ) VALUES ('8/11/2016', '900');

解决方案:

SELECT 
    t1.fecha, 
    COUNT(DISTINCT t2.uid) as daily_cumulative_count
FROM 
    MyTable t1
INNER JOIN 
    MyTable t2
ON 
    t1.fecha >= t2.fecha
GROUP BY 
    t1.fecha
ORDER BY 
    t1.fecha

您可以在此处快速测试此内容

相关内容

  • 没有找到相关文章

最新更新