字符串数组中的唯一列表



我想要所有文档中字符串数组中的值的唯一列表。

示例文档:

{
"_index": li",
"_type": "profile",
"_id": "tqvatGQBhAqGE7-_7pdF",
"nonarrayfield":"person A",
"attributes": [
"blah blah 123",
"112358",
"quick brown fox"
]
},
{
"_index": "li",
"_type": "profile",
"_id": "hqvatGQBhAqGE7-_7pRE",
"nonarrayfield":"person B",
"attributes": [
"blah blah 123",
"00000",
"California"
]
}

我想要的是一个独特的属性列表:

  • "等等123">
  • "112358">
  • "敏捷的棕色狐狸">
  • "00000">
  • "加利福尼亚">

当我尝试一个基本的聚合查询时,我会得到"错误:400-所有碎片都失败了":

'{
"aggs":{
"aggregation_name":{
"terms":{"field":"attributes"}
}
}
}'

当我对非数组字段做同样的事情时,查询是成功的:

'{
"aggs":{
"aggregation_name":{
"terms":{"field":"nonarrayfield"}
}
}
}'

将关键字字段用于等数组类型

{
"size":0,
"aggs":{
"aggregation_name":{
"terms":{"field":"attributes.keyword"}
}
}
}

你的结果看起来像

{
"took": 9,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"aggregation_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "blah blah 123",
"doc_count": 2
},
{
"key": "00000",
"doc_count": 1
},
{
"key": "112358",
"doc_count": 1
},
{
"key": "California",
"doc_count": 1
},
{
"key": "quick brown fox",
"doc_count": 1
}
]
}
}
}

最新更新