我在Logstash中有以下过滤器,用于解析AWS ELB访问日志:
filter {
grok {
match => [ "message", '%{TIMESTAMP_ISO8601:timestamp} %{NOTSPACE:loadbalancer} %{IP:client_ip}:%{NUMBER:client_port:int} (?:%{IP:backend_ip}:%{NUMBER:backend_port:int}|-) %{NUMBER:request_processing_time:float} %{NUMBER:backend_processing_time:float} %{NUMBER:response_processing_time:float} (?:%{NUMBER:elb_status_code:int}|-) (?:%{NUMBER:backend_status_code:int}|-) %{NUMBER:received_bytes:int} %{NUMBER:sent_bytes:int} "(?:%{WORD:verb}|-) (?:%{GREEDYDATA:request}|-) (?:HTTP/%{NUMBER:httpversion}|-( )?)" "%{DATA:userAgent}"( %{NOTSPACE:ssl_cipher} %{NOTSPACE:ssl_protocol})?' ]
}
}
这导致了Elasticsearch中的各个字段,其中一个字段是可能值为的请求
https://api.example.net:443/v2/domain.com/actions?somefield=somevalue
有没有办法添加第二个grok过滤器,使用正则表达式对该字段进行操作,然后将其索引到ES,从而将domain.com和v2提取并索引到各自独立的字段中?
正如leandropjmp所建议的,两个独立的grok块实现了我想要的。以下是我一直在寻找的完整解决方案:
filter {
grok {
match => [ "message", '%{TIMESTAMP_ISO8601:timestamp} %{NOTSPACE:loadbalancer} %{IP:client_ip}:%{NUMBER:client_port:int} (?:%{IP:backend_ip}:%{NUMBER:backend_port:int}|-) %{NUMBER:request_processing_time:float} %{NUMBER:backend_processing_time:float} %{NUMBER:response_processing_time:float} (?:%{NUMBER:elb_status_code:int}|-) (?:%{NUMBER:backend_status_code:int}|-) %{NUMBER:received_bytes:int} %{NUMBER:sent_bytes:int} "(?:%{WORD:verb}|-) (?:%{GREEDYDATA:request}|-) (?:HTTP/%{NUMBER:httpversion}|-( )?)" "%{DATA:userAgent}"( %{NOTSPACE:ssl_cipher} %{NOTSPACE:ssl_protocol})?' ]
}
grok {
match => [ "request", '(/(?<request_endpoint>[^/]+)+/(?<request_version>[^/]+)+/(?<request_domain>[^/]+)/(?<request_api>[^/!?]+))' ]
}
}