我正在处理一个项目,我需要根据"创建"和"标签"字段聚合结果。我创建了以下查询,这些查询都给出了我预期的结果。但是我想了解哪个查询运行得更快?
我的第一个查询:
{
"size": 0,
"aggs": {
"HEATMAP": {
"date_histogram": {
"field": "created",
"interval": "day"
},
"aggs": {
"BEHAVIOUR_CHANGE": {
"terms": {
"field": "labels",
"include": "behavior-change"
}
},
"FIRST_OCCURRENCE": {
"terms": {
"field": "labels",
"include": "first-occurrence"
}
}
}
}
}
}
我的第二个查询:
{
"size": 0,
"aggs": {
"HEATMAP": {
"date_histogram": {
"field": "created",
"interval": "day"
},
"aggs": {
"BEHAVIOUR_CHANGE": {
"filter": {
"regexp": {
"labels": "behavior-change"
}
}
},
"FIRST_OCCURRENCE": {
"filter": {
"regexp": {
"labels": "first-occurrence"
}
}
}
}
}
}
}
由于该字段是一个keyword
字段,并且当涉及到正则表达式(仅完美匹配(时,您不需要任何特殊的东西,因此我会像下面这样做。您还会注意到,我在query
部分添加了一个terms
过滤器,以尝试在通过聚合之前缩小结果范围(理论上,聚合要做的工作更少(。另外,我认为没有理由在这里使用regexp
,因此我使用了terms
聚合。如果您真的对性能比较感兴趣,我建议您在该字段中设置包含更多文档和术语的负载测试,并执行一些测试。Elastic 有自己的基准测试工具,您可以使用它:Rally。
{
"size": 0,
"query": {
"terms": {
"labels": [
"behavior-change",
"first-occurrence"
]
}
},
"aggs": {
"HEATMAP": {
"date_histogram": {
"field": "created",
"interval": "day"
},
"aggs": {
"BEHAVIOUR_CHANGE": {
"terms": {
"field": "labels",
"include": "behavior-change"
}
},
"FIRST_OCCURRENCE": {
"terms": {
"field": "labels",
"include": "first-occurrence"
}
}
}
}
}
}