如何创建每个分区最多包含5个时间相关成员的平均值



我的目标是,只有当记录满足另一个表的左联接条件时,才能准确地选择5条记录的平均值。假设我们有表一(左(的记录:

RECNUM   ID    DATE         JOB
1      | cat | 2019.01.01 | meow
2      | dog | 2019.01.01 | bark

我们有表二(右(记录:

RECNUM   ID    Action_ID    DATE         REWARD
1      | cat | 1          | 2019.01.02 | 20
2      | cat | 99         | 2018.12.30 | 1
3      | cat | 23         | 2019.12.28 | 20       
4      | cat | 54         | 2018.01.01 | 20
5      | cat | 32         | 2018.01.02 | 20
6      | cat | 21         | 2018.01.03 | 20
7      | cat | 43         | 2018.12.28 | 1
8      | cat | 65         | 2018.12.29 | 1
9      | cat | 87         | 2018.09.12 | 1
10     | cat | 98         | 2018.10.11 | 1 
11     | dog | 56         | 2018.09.01 | 99 
12     | dog | 42         | 2019.09.02 | 99 

结果应该返回:

ID  | AVG(Reward_from_latest_5_jobs)
cat | 1

符合的标准应为:对于左表中的每个作业,尝试在

右表我想做的是:

SELECT a."ID", COUNT(b."Action_ID"), AVG(b."REWARD")  
FROM 
( 
SELECT "ID", "DATE"
FROM :left_table
) a  
LEFT JOIN
( 
SELECT "ID", "Action_ID", "DATE", "REWARD"
FROM :right_table
) b 
ON(
a."ID" = b."ID" 
)    
WHERE a."DATE" > b."DATE" 
GROUP BY a."ID"
HAVING COUNT(b."Action_ID") >= 5;

但是,它将计算所有符合标准的Action_ID,而不仅仅是最新的五个。你能告诉我如何达到预期的效果吗?我可以使用子表,而且不必在一条SQL语句中完成。此用例不允许使用过程。非常感谢任何意见。

使用窗口函数获取前5名:

select id, avg(reward)
from (select r.*,
row_number() over (partition by l.id order by r.date desc) as seqnum
from table1 l join
table2 r
on l.id = r.id and l.date > r.date
) r
where seqnum <= 5
group by id
having count(*) >= 5;

然后使用having子句过滤掉那些没有五行的id。

您可以使用窗口函数,然后使用聚合:

select 
id,
avg(reward) avg_reward
from (
select 
t1.id, 
t2.reward, 
count(*) over(partition by t1.id) cnt,
rank() over(partition by t1.id order by t2.date desc) rn
from leftable t1
inner join righttable t2 on t1.id = t2.id and t2.date >= t1.date
) t
where cnt >= 5 and rn <= 5
group by id

内部查询根据您的要求加入表,对每个id的可用记录总数进行窗口计数,并按date降序排列每个id的记录。

然后,外部查询对至少有5条记录的id进行筛选,并为每个id计算前5个记录的平均值。

以下是如何使用联接(如果要进行更多联接,只需对每个联接重复此方法

SELECT ONE.ID, 
CASE WHEN MAX(J1.RN) < 5 THEN NULL ELSE AVG(J1.REWARD) END AS REWARD_AVG
-- we could also use count
--CASE WHEN COUNT(*) = 5 THEN AVG(J1.REWARD) ELSE NULL END AS REWARD_AVG
FROM TABLE_ONE ONE
JOIN (
SELECT
ID,
REWARD,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DATE DESC) AS RN
FROM TABLE_TWO
WHERE TABLE_TWO.DATE < ONE.DATE
) AS J1 ON J1.ID = ONE.ID and RN <= 5 -- take first five only
GROUP BY ONE.ID

最新更新