我有一个 ELK 部署来收集日志。现在我要求提取所有包含特定字符串的日志。但是我遇到了一个有趣的问题,我在 Kibana 的开发工具和 elasticsearch python 客户端中获得了不同的输出。
以下是 Kibana 中的查询:
GET app_web_log-20180827/_search
{
"query": {
"bool": {
"must": [
{ "match_phrase": { "message": "Failed to call Billing API Server" }}
],
"filter": [
{ "term": { "deployment": "app_instance1" }},
{ "term": { "module": "test_module" }},
{ "range": { "@timestamp": { "gte": 1535266800000, "lt": 1535353200000 }}}
]
}
},
"size": 5
}
下面是开发工具的输出:
{
"took": 556,
"timed_out": false,
"_shards": {
"total": 175,
"successful": 175,
"skipped": 165,
"failed": 0
},
"hits": {
"total": 400,
"max_score": 34.769733,
"hits": [
{
"_index": "app_web_log-20180827",
"_type": "doc",
"_id": "FMkHeWUB_hBu7Tio4Llg",
"_score": 34.769733,
"_source": {
"beat": {
"version": "6.2.4",
"name": "app-web001",
"hostname": "app-web001"
},
"offset": 349461,
"@timestamp": "2018-08-27T01:38:03.049Z",
"source": "/apphome/app_instance1/logs/test_module.log",
"message": "2018-08-27 01:37:59,661 [http-bio-8168-exec-8] ERROR [Billing APIClientImpl] Failed to call Billing API Server. Billing API Billing server response error, tranId:c95cede3a011d97fd9f3d661eb961cb8",
"module": "test_module",
"@version": "1",
"deployment": "app_instance1"
}
},
....
但是当我查询时使用 elasticsearch python 客户端。它什么也没给我:
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'esserver', 'port': 9200, 'username': 'appuser', 'password': 'elastic'}])
body = {
"query": {
"bool": {
"must": [
{ "match_phrase": { "message": "Failed to call Billing API Server" }}
],
"filter": [
{ "term": { "deployment": "app_instance1" }},
{ "term": { "module": "test_module" }},
{ "range": { "@timestamp": { "gte": 1535266800000, "lt": 1535353200000 }}}
]
}
}
}
print body
page = es.search(index='app_web_log-20180827', doc_type='doc', body=body,
scroll='2m', size=100)
sid = page['_scroll_id']
scroll_size = page['hits']['total']
while (scroll_size > 0):
print "Scrolling..."
page = es.scroll(scroll_id = sid, scroll = '2m')
# Update the scroll ID
sid = page['_scroll_id']
# Get the number of results that we returned in the last scroll
scroll_size = len(page['hits']['hits'])
for m in page['hits']['hits']:
msg = m['_source']['message']
print msg
我一无所获:
{'query': {'bool': {'filter': [{'term': {'deployment': 'app_instance1'}}, {'term': {'module': 'test_module'}}, {'range': {'@timestamp': {'lt': 1535353200000, 'gte': 1535266800000}}}], 'must': [{'match_phrase': {'message': 'Failed to call Billing API Server'}}]}}}
Scrolling...
我想知道代码中是否有任何问题?请帮忙。谢谢
我建议您查看为您执行逻辑的scan
助手([0](。
我假设由于您只是在调用scroll
之后而不是之前迭代页面,因此您不会处理search
API 调用返回的命中。您还必须将size
设置为100
因此很可能所有命中都在您忽略的page
变量的第一个值中。
0 - https://elasticsearch-py.readthedocs.io/en/master/helpers.html#scan