亚马逊Athena S3前缀



我在雅典娜中运行以下查询

CREATE EXTERNAL TABLE IF NOT EXISTS elb_logs (
 request_timestamp string,
 elb_name string,
 request_ip string,
 request_port int,
 backend_ip string,
 backend_port int,
 request_processing_time double,
 backend_processing_time double,
 client_response_time double,
 elb_response_code string,
 backend_response_code string,
 received_bytes bigint,
 sent_bytes bigint,
 request_verb string,
 url string,
 protocol string,
 user_agent string,
 ssl_cipher string,
 ssl_protocol string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
 'serialization.format' = '1',
 'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" ("[^"]*") ([A-Z0-9-]+) ([A-Za-z0-9.-]*)$' )
LOCATION 's3://your_log_bucket/prefix/AWSLogs/AWS_account_ID/elasticloadbalancing/';

在此查询中,我们需要如下提及S3位置

s3://your_log_bucket/prefix/AWSLogs/AWS_account_ID/elasticloadbalancing/

在此提到的前缀是什么 s3://your_log_bucket/prefix/AWSLogs/AWS_account_ID/elasticloadbalancing/日志的S3位置实际上是此s3://your_log_bucket/AWSLogs/AWS_account_ID/elasticloadbalancing/

我想念什么吗?

如果您的日志位置为 s3://your_log_bucket/AWSLogs/AWS_account_ID/elasticloadbalancing/,那么您不需要定义前缀值,只需将此S3位置保留在雅典娜表的位置。

fyi,说如果多个API的负载平衡器正在同一S3存储桶中生成日志数据,则会有其他S3路径,例如s3://your_log_bucket/api-v1s3://your_log_bucket/api-v2等api-v1/awslogs/aws_account_id/弹性loadbalancing/

最新更新