我很好奇对于我的模式来说,什么是最优化的解决方案,因为我猜这个解决方案仍然不是最快的。
这些是我的日志行:
2021-07-09T11:48:32.328+0700 7fed98b56700 1 beast: 0x7fedfbac36b0: 10.111.111.111 - - [2021-07-09T11:48:32.328210+0700] "GET /glab-reg/docker/registry/v2/blobs/sha256/3f/3fe01ae49e6c42751859c7a8f8a0b5ab4362b215d07d8a0beaa802113dd8d9b8/data HTTP/1.1" 206 4339 - "docker-distribution/v3.0.0-gitlab (go1.14.7) aws-sdk-go/1.27.0 (go1.14.7; linux; amd64)" bytes=0-
2021-07-09T12:11:45.252+0700 7f36b0dd8700 1 beast: 0x7f374adb36b0: 10.111.111.111 - - [2021-07-09T12:11:45.252941+0700] "GET /glab-reg?list-type=2&max-keys=1&prefix= HTTP/1.1" 200 723 - "docker-distribution/v3.0.0-gitlab (go1.14.7) aws-sdk-go/1.27.0 (go1.14.7; linux; amd64)" -
2021-07-09T12:11:45.431+0700 7f360fc96700 1 beast: 0x7f374ad326b0: 10.111.111.111 - - [2021-07-09T12:11:45.431942+0700] "GET /streag/?list-type=2&delimiter=%2F&max-keys=5000&prefix=logs%2F&fetch-owner=false HTTP/1.1" 200 497 - "Hadoop 3.2.2, aws-sdk-java/1.11.563 Linux/5.4.0-70-generic OpenJDK_64-Bit_Server_VM/25.252-b09 java/1.8.0_252 scala/2.12.10 vendor/Oracle_Corporation" -
2021-07-09T12:12:00.738+0700 7fafc968d700 1 beast: 0x7fb0b5f0d6b0: 10.111.111.111 - - [2021-07-09T12:12:00.738060+0700] "GET /csder-prd-cae?list-type=2&max-keys=1000 HTTP/1.1" 200 279469 - "aws-sdk-java/2.16.50 Linux/3.10.0-1160.31.1.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.292-b10 Java/1.8.0_292 scala/2.11.10 vendor/Red_Hat__Inc. io/async http/NettyNio cfg/retry-mode/legacy" -
2021-07-09T12:55:43.573+0700 7fa5329e3700 1 beast: 0x7fa4499846b0: 10.111.111.111 - - [2021-07-09T12:55:43.573351+0700] "PUT /s..prr//WHITELABEL-1/PAGETPYE-7/DEVICE-1/LANGUAGE-18/SUBTYPE-0/10236929 HTTP/1.1" 200 34982 - "aws-sdk-dotnet-coreclr/3.5.10.1 aws-sdk-dotnet-core/3.5.3.8 .NET_Core/4.6.26328.01 OS/Microsoft_Windows_6.3.9600 ClientAsync" -
2021-07-09T12:55:43.587+0700 7fa4e9951700 1 beast: 0x7fa4490f36b0: 10.111.111.111 - - [2021-07-09T12:55:43.587351+0700] "GET /admin/log/?type=data&id=22&marker=1_1625810142.071426_1063846896.1&extra-info=true&rgwx-zonegroup=31a5ea05-c87a-436d-9ca0-ccfcbad481e3 HTTP/1.1" 200 44 - - -
这是我的过滤器:
%{TIMESTAMP_ISO8601:LogTimestamp}] "%{WORD:request_method} (?<swift_v1>(/swift/v1){0,1})/(?<bucketname>(([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]).{1,})*([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]))(?|/?|/)(?<list_type=2>(list-type=2){0,1})%{GREEDYDATA}%{SPACE}HTTP/1.1" %{NUMBER:httprespcode:int}
我刚刚在这里读到一些令人兴奋的文章来改进grok。以下是要点:
根据消息来源,检查一行不匹配的时间可能比常规(成功(匹配慢6倍,尤其是在无匹配开始和没有匹配结束的情况下。所以你可以通过破解失败的比赛来改进grok模式。您可能希望使用锚^
和$
来帮助grok更快地根据开始或结束做出决定。
样品
%{IPORHOST:clientip} %{USER:ident} %{USER:auth} [%{HTTPDATE:timestamp}] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:agent}
那么
^%{IPORHOST:clientip} %{USER:ident} %{USER:auth} [%{HTTPDATE:timestamp}] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:agent}$
结果
使初始匹配失败检测速度提高了约10倍
所有权利属于各自的作者 。令人惊叹的文章。你必须查看链接。