根据可能不存在的日期字段对查询进行排序和查询



需求:我想执行一个查询&对可能不存在的CCD_ 1进行排序。记录日期字段不存在,应首先包括所有记录,然后将包括date field值小于1600230168278的记录。订单将首先是那些不存在的记录date field,然后是date ascending

映射&样本数据:

PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"date": {
"type": "date"
},
"name": {
"type": "text"
}
}
}
}
}
PUT my_index/_doc/1
{
"date": 1546300800000
} 
PUT my_index/_doc/2
{
"date": 1577836800000
} 
PUT my_index/_doc/3
{
"date": 1609459200000
} 
PUT my_index/_doc/4
{
"name": "Arif Mahmud Rana"
} 

我的查询

{
"query": {
"bool": {
"must": {
"function_score": {
"functions": [
{
"filter": {
"exists": {
"field": "date"
}
},
"weight": 0.5
}
],
"query": {
"match_all": {}
}
}
},
"filter": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"bool": {
"must": [
{
"exists": {
"field": "date"
}
},
{
"range": {
"date": {
"lt": 1600230168278
}
}
}
]
}
},
{
"bool": {
"must_not": {
"exists": {
"field": "date"
}
}
}
}
]
}
}
}
},
"sort": [
{
"_score": "desc"
},
{
"date": "asc"
}
],
"size": 100
}

查询结果

{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : null,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "Arif Mahmud Rana"
},
"sort" : [
1.0,
9223372036854775807
]
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.5,
"_source" : {
"date" : 1546300800000
},
"sort" : [
0.5,
1546300800000
]
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.5,
"_source" : {
"date" : 1577836800000
},
"sort" : [
0.5,
1577836800000
]
}
]
}
}

这对于这个数据较少的简单索引来说很好,但当处理大索引时,我的弹性节点会下降。

弹性版本:6.8.5

实际索引:3048140(docs.count(、1073559(docs.deleted(、1.3gb(store.size(&1.3gb(pri.store.size(

任何帮助或想法都将是伟大的TIA。

我认为所有文档的自定义评分都没有在大索引中包含日期字段,这是造成问题的原因。

这里有一种方法可以实现您的用例,使用date field0为缺少排序字段的文档定义排序条件。

GET test/_search
{"query":{"match_all":{}}}
PUT /test
{
"mappings": {

"properties": {

"name": {
"type": "keyword"
},
"age": { "type": "integer" }
}

}
}
POST test/_doc
{
"name": "shahin",
"age": 234
}

POST test/_doc
{
"name": "karim",
"age": 235
}

POST test/_doc
{
"name": "rahim"
}
POST test/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": 
{
"range": {
"age": {
"lt": 250
}
}
}
}
},
{
"bool": {
"must_not": {
"exists": {
"field": "age"
}
}
}
}
]
}
},
"sort": [
{ "age" : {"missing" : "_first", "order": "asc"}}
],
"size": 100
}

我添加了一些优化,可能会对其他人有所帮助。我正在从这个索引中索引我的生产索引。我必须搜索/查询然后循环这些数据&我的生产指数中的指数。这是我的生产查询。

GET /my_index/_search?filter_path=hits.hits._id,hits.hits._source
{
"query": {
"bool": {
"filter": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"range": {
"lastModified": {
"lte": 1600314822988
}
}
},
{
"bool": {
"must_not": {
"exists": {
"field": "lastModified"
}
}
}
}
]
}
}
}
},
"sort": [
{
"indexed": {
"order": "asc",
"missing": "_first"
}
},
{
"lastModified": {
"order": "asc",
"missing": "_first"
}
}
],
"size": 100
}

我使用filter而不是should,因为我的查询不需要匹配项目的分数。此外,我使用filter_path只获取必需的字段。添加这些优化后,我的查询速度至少快了4秒。

最新更新