Elasticsearch-按关键字字符串长度对Terms聚合的结果进行排序

我正在使用Terms聚合查询ES，以查找字符串字段foo的第一个N唯一值，其中该字段包含子字符串bar，并且文档匹配其他一些约束。

目前，我可以按照关键字字符串的字母顺序对结果进行排序：

{
"query": {other constraints},
"aggs": {
"my_values": {
"terms": {
"field": "foo.raw",
"include": ".*bar.*",
"order": {"_key": "asc"},
"size": N
}
}
}
}

这会产生类似的结果

{
...
"aggregations": {
"my_values": {
"doc_count_error_upper_bound": 0,   
"sum_other_doc_count": 145,           
"buckets": [                        
{
"key": "aa_bar_aa",
"doc_count": 1
},
{
"key": "iii_bar_iii",
"doc_count": 1
},
{
"key": "z_bar_z",
"doc_count": 1
}
]
}
}
}

如何更改order选项，使bucket按foo关键字字段中字符串的长度排序，使结果类似

{
...
"aggregations": {
"my_values": {
"doc_count_error_upper_bound": 0,   
"sum_other_doc_count": 145,           
"buckets": [                        
{
"key": "z_bar_z",
"doc_count": 1
},
{
"key": "aa_bar_aa",
"doc_count": 1
},
{
"key": "iii_bar_iii",
"doc_count": 1
}
]
}
}
}

之所以需要这样做，是因为较短的字符串更接近搜索子字符串，因此被认为是"更好"的匹配，因此应该比较长的字符串更早出现在结果中。根据桶与原始子字符串的相似程度对桶进行排序的任何替代方法也会有所帮助。

我需要在ES中进行排序，这样我只需要从ES加载顶部的N结果。

我想出了一种方法来做到这一点。我使用每个动态bucket的子聚合来计算作为另一个字段的密钥字符串的长度。然后我可以先按这个新的长度字段排序，然后按实际键排序，这样相同长度的键就可以按字母顺序排序了。

{
"query": {other constraints},
"aggs": {
"my_values": {
"terms": {
"field": "foo.raw",
"include": ".*bar.*",
"order": [
{"key_length": "asc"},
{"_key": "asc"}
],
"size": N
},
"aggs": {
"key_length": {
"max": {"script": "doc['foo.raw'].value.length()" }
}
}
}
}
}

这给了我类似的结果

{
...
"aggregations": {
"my_values": {
"doc_count_error_upper_bound": 0,   
"sum_other_doc_count": 145,           
"buckets": [                        
{
"key": "z_bar_z",
"doc_count": 1
},
{
"key": "aa_bar_aa",
"doc_count": 1
},
{
"key": "dd_bar_dd",
"doc_count": 1
},
{
"key": "bbb_bar_bbb",
"doc_count": 1
}
]
}
}
}

这正是我想要的。

相关内容

最新更新

热门标签：