我在Athena中有这样的访问日志。
time hostname user_agent http_status col_0 col_1 col_2 col_3
2018-06-01T10:00:00 host01 Mozilla/5.0 200 2018 6 1 10
2019-06-01T10:00:00 host01 Mozilla/5.0 200 2019 6 1 10
2020-06-01T10:00:00 host01 Mozilla/5.0 404 2020 6 1 10
2021-06-08T10:00:00 host01 Mozilla/5.0 404 2021 6 8 10
2021-06-09T10:00:00 host01 Mozilla/5.0 200 2021 6 9 10
2021-06-10T10:00:00 host01 Mozilla/5.0 404 2021 6 10 10
2021-06-10T11:00:00 host01 Mozilla/5.0 503 2021 6 10 11
2021-06-11T10:00:00 host01 Mozilla/5.0 200 2021 6 11 10
2021-06-12T10:00:00 host01 Mozilla/5.0 503 2021 6 12 10
2021-06-13T10:00:00 host01 Mozilla/5.0 200 2021 6 13 10
2021-06-14T10:00:00 host01 Mozilla/5.0 404 2021 6 14 10
2021-06-15T10:00:00 host01 Mozilla/5.0 200 2021 6 15 10
2021-06-15T11:00:00 host01 Mozilla/5.0 503 2021 6 15 11
我想按日期统计http状态。届时,有必要显示一周后的数据。
date status_40x status_50x
2021-06-09 0 0
2021-06-10 1 1
2021-06-11 0 0
2021-06-12 0 1
2021-06-13 0 0
2021-06-14 1 0
2021-06-15 0 1
如何进行sql查询?
类型是。。。
time (string)
hostname (string)
user_agent (string)
http_status (string)
col_0 (string)(Partitioned)
col_1 (string)(Partitioned)
col_2 (string)(Partitioned)
col_3 (string)(Partitioned)
查询
select d.dte, coalesce(status_404, 0), coalesce(status_503, 0)
from (select date '2021-06-09' + n.n * interval '1' day as dte
from unnest(sequence(0, 6, 1)) as n(n)
) d left join
(select DATE_TRUNC('DAY', cast(from_iso8601_timestamp(time) as date)) as dte,
count_if(http_status = '404') AS status_404,
count_if(http_status = '503') AS status_503
from access_logs al
group by dte
) al
on al.dte = d.dte
错误
Error running query: SYNTAX_ERROR: line 9:16: Column 'dte' cannot be resolved
您可以按日期分组(例如,将时间戳转换为日期(并使用count_if
:
WITH dataset AS
(SELECT *
FROM (VALUES (TIMESTAMP '2018-06-01 10:00:00', 401),
(TIMESTAMP '2018-06-01 11:00:00', 400),
(TIMESTAMP '2018-06-01 10:00:00', 500),
(TIMESTAMP '2018-06-02 11:00:00', 400)) AS t (TIME, http_status))
SELECT cast(TIME AS date),
count_if(http_status BETWEEN 400 AND 499) AS "40x",
count_if(http_status BETWEEN 500 AND 599) AS "50x"
FROM dataset
GROUP BY cast(TIME AS date)
以下结果:
_col0 | 40x | 50x |
---|---|---|
2018-06-01 | 2 | 1|
2018-06-02 | 1 | 0 |
如果要填写日期,则需要生成日期。我想你可以用sequence()
来做这个:
select d.dte, coalesce(status_40x, 0), coalesce(status_50x, 0)
from (select date '2021-06-09' + n.n * interval '1 day') as dte
from unnest(sequence(0, 6, 1)) n(n)
) d left join
(select cast(time as date) as dte,
count_if(http_status BETWEEN 400 AND 499) AS status_40x,
count_if(http_status BETWEEN 500 AND 599) AS status_50x
from access_logs al
group by dte
) al
on al.dte = d.dte;