如何使用滞后功能跳过一行?PostgreSQL 9.3.



我目前正在尝试从列出用户冲浪行为的表中编写查询。 该表如下所示

**RecordID  RespondentID    DeviceID    UTCTimestamp    Domain**
1 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 2017-06-11 10:21 goodreads.com 2 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 2017-06-11 10:21 goodreads.com 3 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 2017-11-06 10:21 gr-assets.com 4 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 2017-06-11 10:21 gr-assets.com 5 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 2017-06-11 10:23 itunes.apple.com 6 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 2017-06-11 10:23 itunes.apple.com 7 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 2017-06-11 10:51 samplicio.us 8 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 2017-06-11 10:51 samplicio.us

感谢大家的帮助,我设法得到了这个。

RecordIDRespondentID UTCTimestamp Source Domain to Domain RecordID

2 01FACA75-1216-4A55-B43C-9D64ade852f7 2017-06-11 10:21 goodreads.com gr-assets.com 3 4 01FACA75-1216-4A55-B43C-9D64ade852f7 2017-06-11 10:21 gr-assets.com itunes.apple.com 5 6 01FACA75-1216-4A55-b43c-9d64ade852f7 2017-11-06 10:23 itunes.apple.com samplicio.us 7

"收件人域"是下一行中域名不同的值。

问题 虽然这看起来是正确的,但我们实际上跳过了整个第一条记录。这是因为,给定数据集,第一行"域"连接到第二行"域",我们跳过了它。第 2 行与第 3 行合并,因此第一个生成的记录显示 RecordID 2。我想对此进行进一步微调。我的结果应该从记录 ID 1 开始并跳过记录 ID 2,因为域是相同的,因此结果应该显示

记录 ID 受访者 ID UTCTimestamp 源域到域

1 01FACA75-1216-4A55-B43C-9D64ade852f7 2017-11-06 10:21 goodreads.com gr-assets.com   3 01FACA75-1216-4A55-B43C-9D64ade852f7 2017-11-06 10:21 gr-assets.com itunes.apple.com    5 01faca75-1216-4a55-b43c-9d64ade852f7 2017-11-06 10:23 itunes.apple.com samplicio.us

我尝试跳过 RecordID 2,但是,遇到 SQL 错误"prev_nane不存在"。

SELECT t1."RecordID", t1."RespondentID", t1."UTCTimestamp", t1."Domain" as "Source Domain", t2."Domain" as "To Domain" , t2."RecordID", lag(t1."Domain",1) over (order by t1."RecordID") as prev_name
from public."Traffic - Mobile" as t1
join public."Traffic - Mobile" as t2 on t2."RespondentID" = t1."RespondentID" AND t2."DeviceID"=t1."DeviceID" AND t2."RecordID"=t1."RecordID"+1  And t1."Domain"<>T2."Domain" AND t2."UTCTimestamp">=t1."UTCTimestamp" AND t2."Sequence"-t1."Sequence"=1 and t1."RecordID"<13 AND t1."Domain"<>prev_name;

我做错了什么?

另外,我想要实现的最终结果如下记录 ID 受访者 ID UTCTimestamp 源域到域最终目标

1 01FACA75-1216-4A55-B43C-9D64ade852f7 2017-06-11 10:21 goodreads.com gr-assets.com samplicio.us 3 01FACA75-1216-4A55-B43C-9D64ade852f7 2017-06-11 10:21 gr-assets.com itunes.apple.com samplicio.us 5 01FACA75-1216-4A55-b43c-9d64ade852f7 2017-11-06 10:23 itunes.apple.com samplicio.us samplicio.us

名为"最终目的地"的附加列。这是为了让我将 3 笔交易分组在一起,作为到达 samplicio.us 的路径。

提前谢谢。

试试这个:

with t1 as
(
select 
recordid,
respondentid,
deviceid,
utctimestamp,
domain,
row_number() over (partition by 
respondentid,
deviceid 
order by 
utctimestamp,
recordid) as user_seq,
row_number() over (partition by 
respondentid,
deviceid,
domain
order by 
utctimestamp,
recordid) as user_domain_seq
from traffic_mobile)
select *
from
(
select 
recordid,
respondentid,
deviceid,
utctimestamp,
domain,
lead(domain) over ( partition by 
respondentid,
deviceid order by 
user_seq) as next_domain,
last_value(domain) over( partition by
respondentid,
deviceid order by user_seq 
rows between unbounded preceding
and unbounded following ) 
as final_domain
from t1
where 
user_domain_seq = 1 ) t2
where t2.next_domain is not null

sqlfiddle: sqlfiddle.com/#!17/dc248/3

附言。对于表中只有 1 个条目的用户traffic_mobile查询不会返回一行。如果需要,则需要改进查询以包含它们。

最新更新