>想象一个包含以下行的Web服务器日志:
<timestamp> <ip> <user-agent> <product page>
我想要一份报告
- 使用以下条件计算 24 小时内每个用户会话对产品页面的请求数:
- 唯一用户定义为多个列的组合 ( ( 24小时
- 窗口从第一次请求产品页面的时间戳开始(24小时窗口可以在任何时间开始(
- 如果请求时间戳之间相隔 24 小时,则将被视为新用户会话
对于以下日志:
2019-1-1 01:00 1.2.3.4 Netscape product 5
2019-1-1 01:01 1.2.3.4 Netscape product 5
2019-1-1 01:00 1.2.3.5 Chrome product 5
2019-1-1 01:01 1.2.3.5 Chrome product 5
2019-1-1 01:59 1.2.3.4 Netscape product 5
2019-1-1 02:00 1.2.3.4 Netscape product 4
2019-1-1 02:01 1.2.3.4 Netscape product 4
2019-1-1 02:02 1.2.3.4 Netscape product 4
2019-1-1 07:43 1.2.3.5 Chrome product 5
2019-1-2 2:01 1.2.3.4 Netscape product 5
将产生:
1.2.3.4/Netscape, product 4, 1
1.2.3.4/Netscape, product 5, 2
1.2.3.5/Chrome: product 5, 1
也许第二个查询会输出:
1.2.3.4/Netscape, 6
1.2.3.4/Netscape, 1
1.2.3.5/Chrome, 3
(每个用户 24 小时窗口的请求数,因此 1.2.3.4/Netscape 列出了两次(
将提供上述两个结果集的示例查询是什么?
奖励/可选:如果 24 小时内的请求间隔超过 30m,将被视为另一个新会话
这里有一些东西可以给你一个方向(虽然不一定太高性能/效率,这取决于输入数据集的大小(。
datatable(timestamp:datetime, ip:string, user_agent:string, product_page:string)
[
datetime(2019-01-01 01:00), '1.2.3.4', 'Netscape', 'product 5',
datetime(2019-01-01 01:01), '1.2.3.4', 'Netscape', 'product 5',
datetime(2019-01-01 01:00), '1.2.3.5', 'Chrome', 'product 5',
datetime(2019-01-01 01:01), '1.2.3.5', 'Chrome', 'product 5',
datetime(2019-01-01 01:59), '1.2.3.4', 'Netscape', 'product 5',
datetime(2019-01-01 02:00), '1.2.3.4', 'Netscape', 'product 4',
datetime(2019-01-01 02:01), '1.2.3.4', 'Netscape', 'product 4',
datetime(2019-01-01 02:02), '1.2.3.4', 'Netscape', 'product 4',
datetime(2019-01-01 07:43), '1.2.3.5', 'Chrome', 'product 5',
datetime(2019-01-02 02:01), '1.2.3.4', 'Netscape', 'product 5',
]
| extend user = strcat(ip, "/", user_agent)
| order by user asc, timestamp asc
| extend session_start = row_window_session(timestamp, 24h, 24h, user_agent != prev(user_agent) or product_page != prev(product_page) or ip != prev(ip))
| summarize session_count = dcount(session_start) by user, product_page
-->
| user | product_page | session_count |
|------------------|--------------|---------------|
| 1.2.3.4/Netscape | product 5 | 2 |
| 1.2.3.4/Netscape | product 4 | 1 |
| 1.2.3.5/Chrome | product 5 | 1 |
对于第二个查询,以下内容可能有效:
datatable(timestamp:datetime, ip:string, user_agent:string, product_page:string)
[
datetime(2019-01-01 01:00), '1.2.3.4', 'Netscape', 'product 5',
datetime(2019-01-01 01:01), '1.2.3.4', 'Netscape', 'product 5',
datetime(2019-01-01 01:00), '1.2.3.5', 'Chrome', 'product 5',
datetime(2019-01-01 01:01), '1.2.3.5', 'Chrome', 'product 5',
datetime(2019-01-01 01:59), '1.2.3.4', 'Netscape', 'product 5',
datetime(2019-01-01 02:00), '1.2.3.4', 'Netscape', 'product 4',
datetime(2019-01-01 02:01), '1.2.3.4', 'Netscape', 'product 4',
datetime(2019-01-01 02:02), '1.2.3.4', 'Netscape', 'product 4',
datetime(2019-01-01 07:43), '1.2.3.5', 'Chrome', 'product 5',
datetime(2019-01-02 02:01), '1.2.3.4', 'Netscape', 'product 5',
]
| extend user = strcat(ip, "/", user_agent)
| summarize count() by user, startofday(timestamp)
| project-away timestamp
-->
| user | count_ |
|------------------|--------|
| 1.2.3.4/Netscape | 6 |
| 1.2.3.5/Chrome | 3 |
| 1.2.3.4/Netscape | 1 |