为什么这个基于边缘 ngram 弹性搜索映射的查询无法返回任何结果



下面是映射和分析器设置。 假设我正在索引"书籍"记录。书籍记录上的复数字段(例如出版商和标签)是字符串数组(例如,["random house","macmillan"]),字段"name"采用单数字符串,例如"Blue"。

{
   "state": "open",
   "settings": {
      "index": {
         "number_of_shards": "5",
         "provided_name": "autocomplete_index",
         "creation_date": "1509080632268",
         "analysis": {
            "filter": {
               "edge_ngram": {
                  "token_chars": [
                     "letter",
                     "digit"
                  ],
                  "min_gram": "1",
                  "type": "edgeNGram",
                  "max_gram": "15"
               },
               "english_stemmer": {
                  "name": "possessive_english",
                  "type": "stemmer"
               }
            },
            "analyzer": {
               "keyword_analyzer": {
                  "filter": [
                     "lowercase",
                     "english_stemmer"
                  ],
                  "type": "custom",
                  "tokenizer": "standard"
               },
               "autocomplete_analyzer": {
                  "filter": [
                     "lowercase",
                     "asciifolding",
                     "english_stemmer",
                     "edge_ngram"
                  ],
                  "type": "custom",
                  "tokenizer": "standard"
               }
            }
         },
         "number_of_replicas": "1",
         "uuid": "SSTzdTNFStaSiIBu-l3q5w",
         "version": {
            "created": "5060299"
         }
      }
   },
   "mappings": {
      "autocomplete_mapping": {
         "properties": {
            "publishers": {
               "type": "text",
               "fields": {
                  "keyword": {
                     "ignore_above": 256,
                     "type": "keyword"
                  }
               }
            },
            "name": {
               "type": "text",
               "fields": {
                  "keyword": {
                     "ignore_above": 256,
                     "type": "keyword"
                  }
               }
            },
            "tags": {
               "type": "text",
               "fields": {
                  "keyword": {
                     "ignore_above": 256,
                     "type": "keyword"
                  }
               }
            }
         }
      }
   },
   "aliases": [],
   "primary_terms": {
      "0": 1,
      "1": 1,
      "2": 1,
      "3": 1,
      "4": 1
   },
   "in_sync_allocations": {
      "0": [
         "GXwYiYuWQ16wgxCrpXShJQ"
      ],
      "1": [
         "Do_49lZ4QmyNEYUK_QJfEQ"
      ],
      "2": [
         "vWZ_PjsLSGSVh130C5EvYQ"
      ],
      "3": [
         "5CLINaFJQbqVcZLVOsSNWQ"
      ],
      "4": [
         "hy3JYfmuR7e8fc-anu-heA"
      ]
   }
}

如果我执行以下查询:

curl -XGET 'localhost:9200/autocomplete_index/_search?size=5' -d '
{
"query" : {
    "multi_match" : {
      "query": "b",
      "analyzer": "keyword",
      "fields": ["_all"]
    }
  }
}'

我得到 0 个结果。我必须在查询字段中输入完整的单词"blue"才能获得匹配。

此外,当我做一个"_analyze"时,我得到:

curl -XGET 'localhost:9200/products_autocomplete_dev/_analyze?pretty' -H 'Content-Type: application/json' -d'
{
  "analyzer": "autocomplete_analyzer",
  "field": "name",
  "text": "b"
}
'
{
  "tokens" : [
    {
      "token" : "b",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<ALPHANUM>",
      "position" : 0
    }
  ]
}

我希望至少能取回诸如"b","bl","blu"和"blue"之类的令牌。

以下是索引中的示例文档:

{
  "_index" : "autocomplete_index",
  "_type" : "autocomplete_mapping",
  "_id" : "145",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "name": "Blue",
    "publishers" : [
      "macmillan",
      "Penguin"
    ],
    "themes" : [
      "Butterflies", "Mammals"
    ]
  }
}

我做错了什么?

这个错误有很多地方,我建议你仔细阅读有关分析器的文档。希望你不介意我建议这个。

首先,如果要测试分析器,请不要指定字段名称,而只指定文本和分析器本身:

GET /my_index/_analyze?pretty
{
  "analyzer": "autocomplete_analyzer",
  "text": "blue"
}

如果您定义了自定义分析器,Elasticsearch 如何知道特定字段正在使用该分析器?定义分析器与使用它创建特定字段不同。所以:

    "name": {
      "type": "text",
-->   "analyzer": "autocomplete_analyzer",
      "fields": {
        "keyword": {
          "ignore_above": 256,
          "type": "keyword"
        }
      }
    }

_all字段也是如此:默认情况下,它使用 standard 分析器,除非您更改它,否则它将使用相同的分析器:

  "mappings": {
    "autocomplete_mapping": {
      "_all": {
        "analyzer": "autocomplete_analyzer"
      }, 
      "properties": {
        "publishers": {
          "type": "text", 
          "fields": {
.....

最新更新