用于筛选嵌套数组对象的Painless Elasticsearch脚本



我的用例类似于以下内容。我已经嵌套了对象的数组warehouses,并试图根据数组的最后一个元素进行筛选。

我得到了一些结果,但没有一个是正确的。我也想知道它到底是如何工作的。

比方说,

我想搜索一个基于库存的产品仓库阵列的最后一个元素。这是产品文档的样子:

{
"productId": 5,
"productName": "Shoes",
"warehouses": [
{
"location": "Location A",
"quantity": 100
},
{
"location": "Location B",
"quantity": 10
},
{
"location": "Location C",
"quantity": 50
}
]
}

它的映射是:

PUT /products
{
"mappings": {
"properties": {
"productId": {
"type": "integer"
},
"productName": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 256
}
}
},
"warehouses": {
"properties": {
"location": {
"type": "text"
},
"quantity": {
"type": "integer"  
}
}
}
}
}
}

比方说,我索引了以下7个文档:

POST products/_bulk
{"index":{"_id":1}}
{"productId":1,"productName":"Bags","warehouses":[{"location":"Location A","quantity":20},{"location":"Location B","quantity":30},{"location":"Location C","quantity":50}]}
{"index":{"_id":2}}
{"productId":2,"productName":"Shirts","warehouses":[{"location":"Location A","quantity":100},{"location":"Location B","quantity":150},{"location":"Location C","quantity":150}]}
{"index":{"_id":3}}
{"productId":3,"productName":"Shoes","warehouses":[{"location":"Location A","quantity":100},{"location":"Location B","quantity":10},{"location":"Location C","quantity":50}]}
{"index":{"_id":4}}
{"productId":4,"productName":"Shirt","warehouses":[{"location":"Location A","quantity":100},{"location":"Location B","quantity":10},{"location":"Location C","quantity":60}, {"location":"Location F","quantity":70}]}
{"index":{"_id":5}}
{"productId":5,"productName":"Socks","warehouses":[{"location":"Location A","quantity":800},{"location":"Location B","quantity":1500},{"location":"Location Z","quantity":1000}]}
{"index":{"_id":6}}
{"productId":6,"productName":"TV","warehouses":[{"location":"Location A","quantity":20},{"location":"Location B","quantity":150},{"location":"Location C","quantity":123}]}
{"index":{"_id":7}}
{"productId":7,"productName":"Table","warehouses":[{"location":"Location A","quantity":20},{"location":"Location B","quantity":200},{"location":"Location C","quantity":140}, {"location":"Location D","quantity":123}]}

现在我想用";数量":123因此,根据上面的索引文档,我假设过滤并获得id:6和id:7的产品,因为它的最后一个元素是数量:123。

这是我的无痛(完整(脚本:

GET /products/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"bool": {
"must": {
"script": {
"script": {
"lang": "painless",
"source": """
def x = doc['warehouses.quantity'];
def flag = false;
if(x[x.length - 2 ] == params.limit) {
flag = true;
}

return flag;
""",
"params": {
"limit": 123
}
}
}
}
}
}
}
}
}

所以在上面的脚本中,我得到了id:6的结果,它是电视产品。当我用x[x.length - 3 ]代替x[x.length - 2 ]时,我可以得到id:7的结果。

我不知道如何获得包含两个文档[id:6(TV(和id:7(表(]的结果。

我使用的是Elasticsearch版本:7.8.1。

这是因为您的warehouses数组不是nested类型,因此该数组中元素的顺序无法保证(它实际上是按值的升序排序的(。通过运行以下查询,您可以很容易地看到这一点,并且您将看到123不一定位于最后一个位置:

GET /products/_search
{
"docvalue_fields": ["warehouses.quantity"]
}

响应:

{
"_index" : "products",
"_type" : "_doc",
"_id" : "6",
"_score" : 1.0,
"_source" : {
...
},
"fields" : {
"warehouses.quantity" : [
20,
123,
150
]
}
},
{
"_index" : "products",
"_type" : "_doc",
"_id" : "7",
"_score" : 1.0,
"_source" : {
...
},
"fields" : {
"warehouses.quantity" : [
20,
123,
140,
200
]
}
}

您需要更改映射

PUT /products
{
"mappings": {
"properties": {
"productId": {
"type": "integer"
},
"productName": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 256
}
}
},
"warehouses": {
"type": "nested",           <--- add this
"properties": {
"location": {
"type": "text"
},
"quantity": {
"type": "integer"  
}
}
}
}
}
}

然后您的查询可以看起来像这样,并返回文档6和7:

GET /products/_search
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "warehouses",
"query": {
"script": {
"script": {
"source": """
def x = doc['warehouses.quantity'];
return x[-1] == params.limit;
""",
"params": {
"limit": 123
}
}
}
}
}
}
]
}
}
}

快速提示:x[-1]允许您访问数组的最后一个元素,无论其长度如何。

感谢您的回答@Val,

我试着用functionScore查询来解决它:

GET products/_search
{
"min_score": 0.1,
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": {
"source": """
def last = params['_source']['warehouses'].length - 1;

def quantityOfLast = params._source['warehouses'].get(last);

if (quantityOfLast.quantity == params.limit) {
return 1;
} else {
return 0;
}

""",
"params": {
"limit": 70
}
}
}
}
]
}
}
}

如果您像@Val所说的那样使用嵌套类型更改映射,则可以完全避免使用无痛脚本,并使用简单的嵌套查询:

新映射:

PUT /products
{
"mappings": {
"properties": {
"productId": {
"type": "integer"
},
"productName": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 256
}
}
},
"warehouses": {
"type": "nested", 
"properties": {
"location": {
"type": "text"
},
"quantity": {
"type": "integer"  
}
}
}
}
}
}

查询:

GET products/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "warehouses",
"query": {
"term": {
"warehouses.quantity": {
"value": "123"
}
}
}
}
}
]
}
}
}

结果:

{
"took" : 14,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "products",
"_type" : "_doc",
"_id" : "6",
"_score" : 0.0,
"_source" : {
"productId" : 6,
"productName" : "TV",
"warehouses" : [
{
"location" : "Location A",
"quantity" : 20
},
{
"location" : "Location B",
"quantity" : 150
},
{
"location" : "Location C",
"quantity" : 123
}
]
}
},
{
"_index" : "products",
"_type" : "_doc",
"_id" : "7",
"_score" : 0.0,
"_source" : {
"productId" : 7,
"productName" : "Table",
"warehouses" : [
{
"location" : "Location A",
"quantity" : 20
},
{
"location" : "Location B",
"quantity" : 200
},
{
"location" : "Location C",
"quantity" : 140
},
{
"location" : "Location D",
"quantity" : 123
}
]
}
}
]
}
}

最新更新