在时间框窗口中为关系计算每一行的合计和百分比



好的,所以我有两个表:作业和作业运行。我在用Postgres。

我想看两节课。7天前到现在,以及14天前到7天前。

对于每个作业,我想要一个运行总数,以及每个周期成功和不成功运行的百分比。我编造了一个可怕的问题:

WITH results AS (
select
coalesce(count(case when succeeded = true AND timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_passes,
coalesce(count(case when succeeded = false AND timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_failures,
coalesce(count(case when timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_total_runs,
coalesce(count(case when infrastructure_failure = true AND timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_infrastructure_failures,

coalesce(count(case when succeeded = true AND timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_passes,
coalesce(count(case when succeeded = false AND timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_failures,        
coalesce(count(case when timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_total_runs,
coalesce(count(case when infrastructure_failure = true AND timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_infrastructure_failures
FROM
prow_job_runs JOIN prow_jobs ON prow_jobs.id = prow_job_runs.prow_job_id WHERE prow_jobs.name = 'promote-release-openshift-machine-os-content-e2e-aws-4.10'
)
SELECT *,
previous_passes * 100.0 / NULLIF(previous_total_runs, 0) AS previous_pass_percentage,
previous_failures * 100.0 / NULLIF(previous_total_runs, 0) AS previous_failure_percentage,
current_passes * 100.0 / NULLIF(current_total_runs, 0) AS current_pass_percentage,
current_failures * 100.0 / NULLIF(current_total_runs, 0) AS current_failure_percentage       
FROM results;

这让我得到了我想要的结果:

-[ RECORD 1 ]--------------------+-----------------------
previous_passes                  | 591
previous_failures                | 4
previous_total_runs              | 595
previous_infrastructure_failures | 1
current_passes                   | 67
current_failures                 | 0
current_total_runs               | 67
current_infrastructure_failures  | 0
previous_pass_percentage         | 99.3277310924369748
previous_failure_percentage      | 0.67226890756302521008
current_pass_percentage          | 100.0000000000000000
current_failure_percentage       | 0.00000000000000000000

以下是执行计划:

QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
Subquery Scan on results  (cost=661.12..661.19 rows=1 width=192)
->  Aggregate  (cost=661.12..661.13 rows=1 width=64)
->  Hash Join  (cost=8.30..650.89 rows=93 width=10)
Hash Cond: (prow_job_runs.prow_job_id = prow_jobs.id)
->  Seq Scan on prow_job_runs  (cost=0.00..603.60 rows=14460 width=18)
->  Hash  (cost=8.29..8.29 rows=1 width=8)
->  Index Scan using prow_jobs_name_key on prow_jobs  (cost=0.27..8.29 rows=1 width=8)
Index Cond: (name = 'promote-release-openshift-machine-os-content-e2e-aws-4.10'::text)
(8 rows)

但这只是针对一个作业,如果不在代码中执行for循环,如何获得每个作业的结果?

我还认为我的查询真的很慢,它是>一次作业运行8ms。

TY

您需要提供查询execution plan。但是你必须确保你有必要的索引,也许你在联接中限制你的行数,这会很有帮助:

WITH results AS (
select prow_jobs.name,
coalesce(count(case when succeeded = true AND timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_passes,
coalesce(count(case when succeeded = false AND timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_failures,
coalesce(count(case when timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_total_runs,
coalesce(count(case when infrastructure_failure = true AND timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_infrastructure_failures,
coalesce(count(case when succeeded = true AND timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_passes,
coalesce(count(case when succeeded = false AND timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_failures,        
coalesce(count(case when timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_total_runs,
coalesce(count(case when infrastructure_failure = true AND timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_infrastructure_failures
FROM prow_job_runs 
JOIN prow_jobs 
ON prow_jobs.id = prow_job_runs.prow_job_id                 
and timestamp BETWEEN NOW() and now() - INTERVAL '14 DAY' 
group by prow_jobs.name
)
SELECT *,
previous_passes * 100.0 / NULLIF(previous_total_runs, 0) AS previous_pass_percentage,
previous_failures * 100.0 / NULLIF(previous_total_runs, 0) AS previous_failure_percentage,
current_passes * 100.0 / NULLIF(current_total_runs, 0) AS current_pass_percentage,
current_failures * 100.0 / NULLIF(current_total_runs, 0) AS current_failure_percentage       
FROM results;

并且prow_job_runs表上似乎没有任何索引,请在该表上添加一个具有列(id、successed、infrastructure_failure、timestamp、prow_job_id(的索引

最新更新