在Elasticsearch中获得每个桶的平均文档数的最佳方法是什么?



假设我们是帽子制造商,并且有一个Elasticsearch索引,其中每个文档对应于一顶帽子的销售。销售记录的一部分是出售帽子的商店的名称。我想求出每个商店卖出的帽子的数量,以及所有商店卖出的帽子的平均数量。我能找到的最好的方法是这样搜索:

GET hat_sales/_search
{
"size": 0,
"query": {"match_all": {}},
"aggs": {
"stores": {
"terms": {
"field": "storename",
"size": 65536
},
"aggs": {
"sales_count": {
"cardinality": {
"field": "_id"
}
}
}
},
"average_sales_count": {
"avg_bucket": {
"buckets_path": "stores>sales_count"
}
}
}
}

除了(:我将大小设置为65536,因为这是桶的默认最大数量。

这个查询的问题是sales_count聚合执行冗余计算:每个stores桶已经有一个doc_count属性。但是我如何在bucket路径中访问这个doc_count呢?

我想这就是你要找的

PUT hat_sales
{
"mappings": {
"properties": {
"storename": {
"type": "keyword"
}
}
}
}
POST hat_sales/_bulk?refresh=true
{"index": {}}
{"storename": "foo"}
{"index": {}}
{"storename": "foo"}
{"index": {}}
{"storename": "bar"}
{"index": {}}
{"storename": "baz"}
{"index": {}}
{"storename": "baz"}
{"index": {}}
{"storename": "baz"}

GET hat_sales/_search
{
"size": 0,
"query": {"match_all": {}},
"aggs": {
"stores": {
"terms": {
"field": "storename",
"size": 65536
}
},
"average_sales_count": {
"avg_bucket": {
"buckets_path": "stores>_count"
}
}
}
}

到达_count的路径是stores>_count

结果如下:

{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 6,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"stores" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "baz",
"doc_count" : 3
},
{
"key" : "foo",
"doc_count" : 2
},
{
"key" : "bar",
"doc_count" : 1
}
]
},
"average_sales_count" : {
"value" : 2.0
}
}
}