我想创建一个VIEW,它总是从最近的两个Athena分区返回最后1小时的数据。
我正在使用下面的AmazonAthena DDL和一个varchar类型的名为"datehour"的分区列。
CREATE EXTERNAL TABLE mydb.table_foo (
`account_id` string,
`account_email_address` string,
`record_timestamp` timestamp)
PARTITIONED BY (
`datehour` string)
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://data-here-9191919191/table_foo'
TBLPROPERTIES (
'projection.datehour.format'='yyyy/MM/dd/HH',
'projection.datehour.interval'='1',
'projection.datehour.interval.unit'='HOURS',
'projection.datehour.range'='2020/11/01/00,NOW',
'projection.datehour.type'='date',
'projection.enabled'='true',
'storage.location.template'='s3://data-here-9191919191/table_foo/${datehour}',
'transient_lastDdlTime'='1604013447')
以下是我想要运行的查询,以创建一个VIEW,它总是返回最近2个分区中最后1小时的数据。
select *
from mydb.table_foo
where
(datehour = CONCAT(
CAST( year(current_timestamp) AS varchar) , '/',
CAST( month(current_timestamp) AS varchar), '/',
CAST( day(current_timestamp) AS varchar), '/',
CAST( hour(current_timestamp) AS varchar))
OR
datehour = CONCAT(
CAST( year(current_timestamp) AS varchar) , '/',
CAST( month(current_timestamp) AS varchar), '/',
CAST( day(current_timestamp) AS varchar), '/',
CAST( ( hour(current_timestamp - interval '1' hour) ) AS varchar)))
AND
record_timestamp BETWEEN (current_timestamp - interval '1' hour) AND current_timestamp
我想让SQL动态化的示例是这样的:
select *
from table_foo
where datehour = '2020/11/17/23' or datehour = '2020/11/18/00' AND
record_timestamp BETWEEN (current_timestamp - interval '1' hour) AND current_timestamp
动态WHERE逻辑会围绕所有字段(月、日和年(的更改带来问题。我在第一天,也就是今天午夜左右遇到了这个问题。如何动态生成datehour分区?
解决方案应该是以正确的格式将current_timestamp - interval '1' hour
强制转换为varchar
。假设数据被划分为yyyy/MM/dd/HH
,如表定义中所述:
select *
from mydb.table_foo
WHERE datehour BETWEEN date_format(date_trunc('hour',current_timestamp - interval '1' hour),'%Y/%m/%d/%H') AND date_format(date_trunc('hour',current_timestamp),'%Y/%m/%d/%H')
AND record_timestamp BETWEEN (current_timestamp - interval '1' hour) AND current_timestamp