Elasticsearch:获取聚合数据的最后一个条目的摘要



我有一个Elasticsearch索引,里面有这样的文档:

时间戳
entity_id 操作
a1 X 2021-01-01
a1 Y 2021-01-02
a1 Z 2021-01-10
b1 Z 2021-01-03
b1 Z 2021-01-05
b1 Y > 2021-01-20
c1 Z 2021-01-03
c1 X 2021-01-05
c1 Y > 2021-01-20//td>

我一直在寻找解决此类任务的方法。我发现了很多类似的问题,但到目前为止还没有合适的建议。这是我最终得出的解决方案,也许它对有类似任务的人有用。其思想是使用scripted_metric聚合,并通过脚本计算所需的汇总数据

{
"size": 0,
"aggs":{
"total": {
"scripted_metric": {
"init_script": "state.operations=new Hashtable();", 
"map_script": <Add to state.operations every doc using entity_id as key. When another doc for the same entity_id is found check its timestamp and replace the existing doc if the new found doc is newer>,
"combine_script": "return state.operations",
"reduce_script": <Here you have "states" variable which contains hashtables returned by the combine script per each shard. You can iterate states, merge all hashtables together and return the resulting hashtable or just calculate needed summary values>
}
}
},
"sort": [
{
"timestamp": {
"order": "desc"
}
}
]

}

这只是一个算法,我在map_script和combine_script中写了简单的描述,因为我的真实情况比我在这里发布的简化示例复杂得多。

最新更新