用于ElasticSearch聚合的等效SQl相关查询



我有一个编写聚合的用例,如果用SQL编写,则可以使用关联查询来实现聚合。

我有一个名为listings的索引,其中的属性/列为ListDate、ListPrice、SoldDate、SoldPrice、OffMarketDate。

ListDate不可为Null,但SoldDate、SoldPrice和OffMarketDate可以为Null。

我想根据以下要求汇总上述索引的统计数据。

  • 我想有每月的统计数据,我认为这可以通过日期直方图聚合
  • 从DateHistogramAgregation,我想找到如下列表:示例:对于2019年1月,获取所有包含(ListDate<2019年2月1日(和(SoldDate为null或SoldDate<2019年1日(以及(OffMarketDate为null或者OffMarketDate<的列表
  • 然后每月为这些列表运行聚合函数

我非常感谢任何关于实现此用例的建议。提前感谢您的帮助。

请参阅以下详细信息,了解如何解决此问题:

映射:

PUT listings
{
"mappings": {
"properties": {
"listDate":{
"type": "date"
},
"listPrice":{
"type": "long"
},
"soldDate":{
"type": "date"
},
"soldPrice": {
"type": "long"
},
"offMarketDate": {
"type": "date"
}
}
}
}

请注意,我已经根据您的问题构建了上述映射。

示例文件:

POST listings/_doc/1
{
"listDate": "2020-01-01",
"listPrice": "100.00",
"soldDate": "2019-12-25",
"soldPrice": "120.00",
"offMarketDate": "2019-12-20"
}
POST listings/_doc/2
{
"listDate": "2020-01-01",
"listPrice": "100.00",
"soldDate": "2019-12-24",
"soldPrice": "122.00",
"offMarketDate": "2019-12-20"
}
POST listings/_doc/3
{
"listDate": "2020-01-25",
"listPrice": "120.00",
"soldDate": "2020-01-30",
"soldPrice": "140.00",
"offMarketDate": "2020-01-26"
}
POST listings/_doc/4
{
"listDate": "2020-01-25",
"listPrice": "120.00",
"soldDate": "2020-02-02",
"soldPrice": "135.00",
"offMarketDate": "2020-01-26"
}
POST listings/_doc/5
{
"listDate": "2020-01-25",
"listPrice": "120.00"
}
POST listings/_doc/6
{
"listDate": "2020-02-02",
"listPrice": "120.00"
}

请注意,我没有在文档5和6中添加soldDateoffMarketDate,因为这比使用null值更好。

请求查询:

因此,我为您的用例提出了以下查询。

同样为了聚合,假设我已经计算了具有的文档的总soldPrice

  • Jan 2020月份的listDateAND
  • (soldDatenullsoldDateJan 2020月份之前(AND
  • (offMarketDateJan 2020月份之前的nulloffMarketDate(

以下是查询:

POST listings/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"listDate": {
"gte": "2020-01-01",
"lte": "2020-02-01"
}
}
},
{
"bool": {
"should": [
{
"range": {
"soldDate": {
"lte": "2020-01-01"
}
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "soldDate"
}
}
]
}
}
],
"minimum_should_match": 1
}
},
{
"bool": {
"should": [
{
"range": {
"offMarketDate": {
"lte": "2020-01-01"
}
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "offMarketDate"
}
}
]
}
}
],
"minimum_should_match": 1
}
}
]
}
},
"aggs": {
"my_histogram": {
"date_histogram": {
"field": "listDate",
"calendar_interval": "month"
},
"aggs": {
"total_sales_price": {
"sum": {
"field": "soldPrice"
}
}
}
}
}
}

上面的查询非常容易阅读和解释。我建议阅读以下我使用过的不同查询:

  • 布尔查询
  • 范围查询
  • 字段存在用于验证字段是否存在的查询
  • 数据直方图聚合
  • 总和度量聚合

响应:

{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 3.0,
"hits" : [
{
"_index" : "listings",
"_type" : "_doc",
"_id" : "1",
"_score" : 3.0,
"_source" : {
"listDate" : "2020-01-01",
"listPrice" : "100.00",
"soldDate" : "2019-12-25",
"soldPrice" : "120.00",
"offMarketDate" : "2019-12-20"
}
},
{
"_index" : "listings",
"_type" : "_doc",
"_id" : "2",
"_score" : 3.0,
"_source" : {
"listDate" : "2020-01-01",
"listPrice" : "100.00",
"soldDate" : "2019-12-24",
"soldPrice" : "122.00",
"offMarketDate" : "2019-12-20"
}
},
{
"_index" : "listings",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"listDate" : "2020-01-25",
"listPrice" : "120.00"
}
}
]
},
"aggregations" : {
"my_histogram" : {
"buckets" : [
{
"key_as_string" : "2020-01-01T00:00:00.000Z",
"key" : 1577836800000,
"doc_count" : 3,
"total_sales_price" : {
"value" : 242.0
}
}
]
}
}
}

正如预期的那样,文档1,2 and 5显示为soldPrice的正确合计。

希望能有所帮助!