在大查询中移动不同计数(SQL语法)

对于所有的SQL主机，我都有一个奇怪的。我需要在14天的移动窗口中获得不同的项目数。我尝试了dense_rank，但它没有指定(或者我不知道如何指定(14天的移动窗口。

为了简单起见，我的数据集有3列。

存储(字符串(
项目代码(字符串(
日期

我的endo目标的一个快速例子如下：

第1天我扫描项目1,2,3,4
第2天我扫描项目2,3,4,5
第3天我扫描项目1.6

所以第一天我的uniques是4，第二天我的uniques是5，第三天我的uniques是6(1,2,3,4,5,6(

一旦我到了第15天，我就会忽略在第1天找到的值，只需要2-15天的

如有任何帮助，我们将不胜感激。

考虑以下方法

select store, date, 
( select count(distinct item) 
from t.items item
) distinct_items_count
from (
select store, date, any_value(items) items
from (
select store, date, 
array_agg(item_code) over(partition by store order by unix_date(date) range between 13 preceding and current row) items
from your_table
)
group by store, date
) t

另一个需要考虑的选项-使用HyperLogLog++函数-因此它消耗的资源更少，更快

select store, date, 
( select hll_count.merge(sketch)
from t.sketches_14days sketch 
) distinct_items_count
from (
select store, date, 
array_agg(daily_sketch) over(partition by store order by unix_date(date) range between 13 preceding and current row) sketches_14days
from (
select store, date, hll_count.init(item_code) daily_sketch
from your_table
group by store, date
)
) t

注：

HLL++函数是近似聚合函数。近似聚合通常比精确聚合函数(如COUNT(DISTINCT((需要更少的内存，但也会引入统计不确定性。这使得HLL++函数适用于线性内存使用不切实际的大型数据流，以及已经近似的数据。

相关内容

最新更新

热门标签：