我想用亚马逊athena按日期总结http状态



我在Athena中有这样的访问日志。

time                  hostname    user_agent      http_status  col_0    col_1     col_2    col_3
2018-06-01T10:00:00   host01      Mozilla/5.0     200          2018     6         1        10
2019-06-01T10:00:00   host01      Mozilla/5.0     200          2019     6         1        10
2020-06-01T10:00:00   host01      Mozilla/5.0     404          2020     6         1        10
2021-06-08T10:00:00   host01      Mozilla/5.0     404          2021     6         8        10
2021-06-09T10:00:00   host01      Mozilla/5.0     200          2021     6         9        10
2021-06-10T10:00:00   host01      Mozilla/5.0     404          2021     6         10        10
2021-06-10T11:00:00   host01      Mozilla/5.0     503          2021     6         10        11
2021-06-11T10:00:00   host01      Mozilla/5.0     200          2021     6         11        10
2021-06-12T10:00:00   host01      Mozilla/5.0     503          2021     6         12        10
2021-06-13T10:00:00   host01      Mozilla/5.0     200          2021     6         13        10
2021-06-14T10:00:00   host01      Mozilla/5.0     404          2021     6         14        10
2021-06-15T10:00:00   host01      Mozilla/5.0     200          2021     6         15        10
2021-06-15T11:00:00   host01      Mozilla/5.0     503          2021     6         15        11

我想按日期统计http状态。届时,有必要显示一周后的数据。

date        status_40x  status_50x
2021-06-09  0           0
2021-06-10  1           1
2021-06-11  0           0
2021-06-12  0           1
2021-06-13  0           0
2021-06-14  1           0
2021-06-15  0           1

如何进行sql查询?

类型是。。。

time (string) 
hostname (string) 
user_agent (string) 
http_status (string) 
col_0 (string)(Partitioned) 
col_1 (string)(Partitioned) 
col_2 (string)(Partitioned) 
col_3 (string)(Partitioned)

查询

select d.dte, coalesce(status_404, 0), coalesce(status_503, 0)
from (select date '2021-06-09' + n.n * interval '1' day as dte
from unnest(sequence(0, 6, 1)) as n(n)
) d left join
(select DATE_TRUNC('DAY', cast(from_iso8601_timestamp(time) as date)) as dte, 
count_if(http_status = '404') AS status_404,
count_if(http_status = '503') AS status_503
from access_logs al
group by dte
) al
on al.dte = d.dte

错误

Error running query: SYNTAX_ERROR: line 9:16: Column 'dte' cannot be resolved

您可以按日期分组(例如,将时间戳转换为日期(并使用count_if:

WITH dataset AS
(SELECT *
FROM (VALUES (TIMESTAMP '2018-06-01 10:00:00', 401), 
(TIMESTAMP '2018-06-01 11:00:00', 400),
(TIMESTAMP '2018-06-01 10:00:00', 500), 
(TIMESTAMP '2018-06-02 11:00:00', 400)) AS t (TIME, http_status))
SELECT cast(TIME AS date),
count_if(http_status BETWEEN 400 AND 499) AS "40x",
count_if(http_status BETWEEN 500 AND 599) AS "50x"
FROM dataset
GROUP BY cast(TIME AS date)

以下结果:

1
_col040x50x
2018-06-012
2018-06-0210

如果要填写日期,则需要生成日期。我想你可以用sequence()来做这个:

select d.dte, coalesce(status_40x, 0), coalesce(status_50x, 0)
from (select date '2021-06-09' + n.n * interval '1 day') as dte
from unnest(sequence(0, 6, 1)) n(n)
) d left join
(select cast(time as date) as dte, 
count_if(http_status BETWEEN 400 AND 499) AS status_40x,
count_if(http_status BETWEEN 500 AND 599) AS status_50x
from access_logs al
group by dte
) al
on al.dte = d.dte;

相关内容

最新更新