我目前正在尝试从列出用户冲浪行为的表中编写查询。 该表如下所示
**RecordID RespondentID DeviceID UTCTimestamp Domain**
1 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 2017-06-11 10:21 goodreads.com 2 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 2017-06-11 10:21 goodreads.com 3 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 2017-11-06 10:21 gr-assets.com 4 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 2017-06-11 10:21 gr-assets.com 5 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 2017-06-11 10:23 itunes.apple.com 6 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 2017-06-11 10:23 itunes.apple.com 7 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 2017-06-11 10:51 samplicio.us 8 01faca75-1216-4a55-b43c-9d64ade852f7 4DF57C06-F0BD-4779-8983-37A8B02E5EDF 2017-06-11 10:51 samplicio.us
感谢大家的帮助,我设法得到了这个。
RecordIDRespondentID UTCTimestamp Source Domain to Domain RecordID
2 01FACA75-1216-4A55-B43C-9D64ade852f7 2017-06-11 10:21 goodreads.com gr-assets.com 3 4 01FACA75-1216-4A55-B43C-9D64ade852f7 2017-06-11 10:21 gr-assets.com itunes.apple.com 5 6 01FACA75-1216-4A55-b43c-9d64ade852f7 2017-11-06 10:23 itunes.apple.com samplicio.us 7
"收件人域"是下一行中域名不同的值。
问题 虽然这看起来是正确的,但我们实际上跳过了整个第一条记录。这是因为,给定数据集,第一行"域"连接到第二行"域",我们跳过了它。第 2 行与第 3 行合并,因此第一个生成的记录显示 RecordID 2。我想对此进行进一步微调。我的结果应该从记录 ID 1 开始并跳过记录 ID 2,因为域是相同的,因此结果应该显示
记录 ID 受访者 ID UTCTimestamp 源域到域
1 01FACA75-1216-4A55-B43C-9D64ade852f7 2017-11-06 10:21 goodreads.com gr-assets.com 3 01FACA75-1216-4A55-B43C-9D64ade852f7 2017-11-06 10:21 gr-assets.com itunes.apple.com 5 01faca75-1216-4a55-b43c-9d64ade852f7 2017-11-06 10:23 itunes.apple.com samplicio.us
我尝试跳过 RecordID 2,但是,遇到 SQL 错误"prev_nane不存在"。
SELECT t1."RecordID", t1."RespondentID", t1."UTCTimestamp", t1."Domain" as "Source Domain", t2."Domain" as "To Domain" , t2."RecordID", lag(t1."Domain",1) over (order by t1."RecordID") as prev_name
from public."Traffic - Mobile" as t1
join public."Traffic - Mobile" as t2 on t2."RespondentID" = t1."RespondentID" AND t2."DeviceID"=t1."DeviceID" AND t2."RecordID"=t1."RecordID"+1 And t1."Domain"<>T2."Domain" AND t2."UTCTimestamp">=t1."UTCTimestamp" AND t2."Sequence"-t1."Sequence"=1 and t1."RecordID"<13 AND t1."Domain"<>prev_name;
我做错了什么?
另外,我想要实现的最终结果如下记录 ID 受访者 ID UTCTimestamp 源域到域最终目标
1 01FACA75-1216-4A55-B43C-9D64ade852f7 2017-06-11 10:21 goodreads.com gr-assets.com samplicio.us 3 01FACA75-1216-4A55-B43C-9D64ade852f7 2017-06-11 10:21 gr-assets.com itunes.apple.com samplicio.us 5 01FACA75-1216-4A55-b43c-9d64ade852f7 2017-11-06 10:23 itunes.apple.com samplicio.us samplicio.us
名为"最终目的地"的附加列。这是为了让我将 3 笔交易分组在一起,作为到达 samplicio.us 的路径。
提前谢谢。
试试这个:
with t1 as
(
select
recordid,
respondentid,
deviceid,
utctimestamp,
domain,
row_number() over (partition by
respondentid,
deviceid
order by
utctimestamp,
recordid) as user_seq,
row_number() over (partition by
respondentid,
deviceid,
domain
order by
utctimestamp,
recordid) as user_domain_seq
from traffic_mobile)
select *
from
(
select
recordid,
respondentid,
deviceid,
utctimestamp,
domain,
lead(domain) over ( partition by
respondentid,
deviceid order by
user_seq) as next_domain,
last_value(domain) over( partition by
respondentid,
deviceid order by user_seq
rows between unbounded preceding
and unbounded following )
as final_domain
from t1
where
user_domain_seq = 1 ) t2
where t2.next_domain is not null
sqlfiddle: sqlfiddle.com/#!17/dc248/3
附言。对于表中只有 1 个条目的用户traffic_mobile查询不会返回一行。如果需要,则需要改进查询以包含它们。