我的情况是我有如下记录。
+-----------+---------+---+--------+
| a | b | c | d |
+-----------+---------+---+--------+
| 537196605 | HZA-LOC | 0 | 201701 |
| 537196605 | HZA-LOC | 0 | 201702 |
| 537196605 | HZA-LOC | 0 | 201703 |
| 537196605 | HZA-LOC | 0 | 201704 |
| 537196605 | HZA-LOC | 0 | 201705 |
| 537196605 | HZA-LOC | 2 | 201706 |
| 537196605 | HZA-LOC | 0 | 201707 |
| 537196605 | HZA-LOC | 4 | 201708 |
| 537196605 | HZA-LOC | 0 | 201709 |
| 537196605 | HZA-LOC | 0 | 201710 |
| 537196605 | HZA-LOC | 0 | 201711 |
| 537196605 | HZA-LOC | 0 | 201712 |
+-----------+---------+---+--------+
现在我想在这里计算零的平均值。
我的意思是我需要计算为计数(零(/总零对,即 10/3 = 3.33
所有在一起的零都被视为一个值。前 5 个值将计为 1,第 7 个值计为 1,9-12 计为 1,总计为 3。请注意,C 列的值是根据 A 和 B 的分组生成的。期望的结果应如下所示:-
阿 b H537196605 HZA-LOC 3.33
请帮助我为此提供一个通用解决方案,因为我有数百万条这样的记录。
这是使用窗口函数的一种方法
SELECT [Avg] = (Count(*) * 1.0) / Sum(CASE WHEN prev_c = c THEN 0 ELSE 1 END)
FROM (SELECT *,
prev_c = Lag(c)OVER(ORDER BY d)
FROM yourtable) a
WHERE c = 0
一个ROW_NUMBER()
技巧来创建group_id
值。
每个组的成员数和组数,就完成了。
http://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=e10b07b6fd9f8bc7549a3b98aaff3af6
WITH
enumerated_example AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY a, b ORDER BY d) AS row_id,
ROW_NUMBER() OVER (PARTITION BY a, b, c ORDER BY d) AS zero_row_id,
*
FROM
example
),
zero_groups AS
(
SELECT
row_id - zero_row_id AS group_id,
COUNT(*) AS group_size
FROM
enumerated_example
WHERE
c = 0
GROUP BY
row_id - zero_row_id
)
SELECT
CAST(SUM(group_size) AS DECIMAL(13, 8)) / COUNT(*)
FROM
zero_groups
一种方法是使用运行总和设置组。然后计算每个a,b组合的组数。此后,您只需要group by
.
select a,b,1.0*grp_count/count(distinct grp) as result
from (select a,b,c,d,
sum(case when c<>0 then 1 else 0 end) over(partition by a,b order by d) as grp,
count(case when c=0 then 1 end) over(partition by a,b) as grp_count
from tbl
) t
group by a,b,grp_count
一个不同(但相似(的解决方案:
CREATE TABLE #Sample (a int,
b varchar(7),
c tinyint,
d int)
INSERT INTO #Sample
VALUES
(537196605,'HZA-LOC',0,201701),
(537196605,'HZA-LOC',0,201702),
(537196605,'HZA-LOC',0,201703),
(537196605,'HZA-LOC',0,201704),
(537196605,'HZA-LOC',0,201705),
(537196605,'HZA-LOC',2,201706),
(537196605,'HZA-LOC',0,201707),
(537196605,'HZA-LOC',4,201708),
(537196605,'HZA-LOC',0,201709),
(537196605,'HZA-LOC',0,201710),
(537196605,'HZA-LOC',0,201711),
(537196605,'HZA-LOC',0,201712);
GO
WITH Groups AS(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY a, b ORDER BY d) -
ROW_NUMBER() OVER (PARTITION BY a, b, CASE c WHEN 0 THEN 0 ELSE 1 END ORDER BY d) AS Grp
FROM #Sample)
SELECT a, b,
CONVERT(decimal(6,2),COUNT(*)) / COUNT(DISTINCT Grp) AS h
FROM Groups
WHERE c = 0
GROUP BY a,b;
GO
DROP TABLE #Sample;
GO