ElasticSearch:搜索flatten属性中所有键值相同的所有文档



假设我在elasticsearch中有两种文档,当"map"扁平化的:1 .

doc1: {
"name": "foo1",
"map": {
"key1": 100,
"key2": 100
}
}
  • doc2: {
    "name": "foo2",
    "map": {
    "key1": 100,
    "key2": 90
    }
    }
    

    我是否可以搜索elasticsearch以获取其"映射"属性的所有文档(例如:Key1, key2)具有相同的值(例如:"100"对于它们的所有属性(key1=100, key2=100),所以它将返回doc1,而不需要事先知道在"map"下存在什么属性;财产吗?

    谢谢!

    是。实际上有两种方法可以实现你的目标:

    1. 通过摄取管道向文档添加一个标志字段,然后对这个新字段运行一个常规过滤器(推荐)
    2. 通过运行时字段动态生成标志字段

    # 1是推荐的方法,因为在每个查询上迭代每个文档并不能很好地扩展。创建标志字段的效率要高得多。给定你的2个文档:

    POST test_script/_doc
    {
    "name": "foo1",
    "map": {
    "key1": 100,
    "key2": 100
    }
    }
    POST test_script/_doc
    {
    "name": "foo2",
    "map": {
    "key1": 100,
    "key2": 90
    }
    }
    

    1。通过摄取管道向文档添加标志字段(推荐)

    创建摄取管道:

    PUT _ingest/pipeline/is_100_field
    {
    "processors": [
    {
    "script": {
    "source": "def keys_100 = 0;ndef keys = ctx['map'].keySet();nnfor (key in keys) {n    if(ctx['map'][key] == 100){n        keys_100 = keys_100 + 1;n    }n}nnctx.is_100 = keys.size() == keys_100;",
    "ignore_failure": true
    }
    }
    ]
    }
    

    你现在可以使用这个摄取管道重新索引你的数据,或者配置它应用于每个文档:

    重建索引:

    POST your_index/_update_by_query?pipeline=is_100_field
    

    摄入

    POST your_index/_doc?pipeline=is_100_field
    

    这将生成以下文档模型

    {
    "took": 0,
    "timed_out": false,
    "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
    },
    "hits": {
    "total": {
    "value": 2,
    "relation": "eq"
    },
    "max_score": 1,
    "hits": [
    {
    "_index": "test_script",
    "_id": "78_AvoQB5Gw0WET88nZE",
    "_score": 1,
    "_source": {
    "name": "foo1",
    "map": {
    "key1": 100,
    "key2": 100
    },
    "is_100": true
    }
    },
    {
    "_index": "test_script",
    "_id": "8s_AvoQB5Gw0WET8-HYO",
    "_score": 1,
    "_source": {
    "name": "foo2",
    "map": {
    "key1": 100,
    "key2": 90
    },
    "is_100": false
    }
    }
    ]
    }
    }
    

    现在你可以运行一个常规的过滤器,这是最有效的方式:

    GET test_script/_search
    {
    "query": {
    "bool": {
    "filter": [
    {
    "term": {
    "is_100": true
    }
    }
    ]
    }
    }
    }
    

    通过运行时字段动态生成标志字段

    脚本是相同的,但现在字段将动态生成,而不是从数据中摄取。我们可以将这个字段添加到映射中,或者添加到查询中:

    映射:

    PUT test_script_runtime/
    {
    "mappings": {
    "runtime": {
    "is_100": {
    "type": "boolean",
    "script": {
    "source": """
    def keys_100 = 0;
    def keys = params._source['map'].keySet();
    
    for (key in keys) {
    if(params._source['map'][key] == 100){
    keys_100 = keys_100 + 1;
    }
    }
    
    emit(keys.size() == keys_100);
    """
    }
    }
    },
    "properties": {
    "map": {"type": "object"},
    "name": {"type": "text"}
    }
    }
    }
    
    查询

    GET test_script/_search
    {
    "runtime_mappings": {
    "is_100": {
    "type": "boolean",
    "script": {
    "source": """
    def keys_100 = 0;
    def keys = params._source['map'].keySet();
    
    for (key in keys) {
    if(params._source['map'][key] == 100){
    keys_100 = keys_100 + 1;
    }
    }
    
    emit(keys.size() == keys_100);
    """
    }
    }
    },
    "query": {
    "bool": {
    "filter": [
    {
    "term": {
    "is_100": true
    }
    }
    ]
    }
    }
    }
    

    如果你决定索引运行时字段,你可以很容易地做到:https://www.elastic.co/guide/en/elasticsearch/reference/current/runtime-indexed.html

    最新更新