我有以下按player_id和match_date排序的数据。我想找出连续运行次数最多的记录组(从2014-04-03到2014-04-12,连续3次运行4次(
player_id match_date runs
1 2014-04-01 5
1 2014-04-02 55
1 2014-04-03 4
1 2014-04-10 4
1 2014-04-12 4
1 2014-04-14 3
1 2014-04-19 4
1 2014-04-20 44
2 2014-04-01 23
2 2014-04-02 23
2 2014-04-03 23
2 2014-04-10 23
2 2014-04-12 4
2 2014-04-14 3
2 2014-04-19 23
2 2014-04-20 1
我提出了以下SQL:
select *,row_number() over (partition by ranked.player_id,ranked.runs
order by ranked.match_date) as R from (
select player_id ,match_date,runs from players order by 1,2 desc )
ranked order by ranked.player_id, match_date asc
但这延续了前几次连续运行的排名(2014-04-19,玩家1的4次运行预计将获得排名1,但由于同一分区已经出现了3次,因此获得排名4(。同样,2014-04-19赛季第2名球员的23分预计将排名第一,但排名第5,因为该球员已经出现了4次23分。
当运行次数的值从前一行更改时,我如何将秩重置回1?
架构、数据、SQL和输出可在SQLFiddle上获得。
您可以使用窗口函数来完成此操作。
select player_id, runs, count(*) as numruns
from (select p.*,
(row_number() over (partition by player_id order by match_date) -
row_number() over (partition by player_id, runs order by match_date)
) as grp
from players p
) pg
group by grp, player_id, runs
order by numruns desc
limit 1;
关键的观察结果是,"按顺序跑步"具有以下特性:如果你按日期枚举(每个球员的(行,并按日期枚举每个球员的行和跑步次数,那么当跑步次数相同且按顺序时,差异是恒定的。这形成了一个组,您可以使用该组进行聚合来识别您想要的玩家。
这是SQL Fiddle。
select p1.player_id, p1.match_date, p1.runs, count(p2.match_date) from players p1
join players p2 on p1.player_id = p2.player_id
and p1.match_date >= p2.match_date
and p1.runs = p2.runs
and not exists (
select 1 from players p3
where p3.runs <> p2.runs
and p3.player_id = p2.player_id
and p3.match_date < p1.match_date
and p3.match_date > p2.match_date
)
group by p1.player_id, p1.match_date, p1.runs
order by p1.player_id, p1.match_date
http://sqlfiddle.com/#!15/78a77/1