Elasticsearch同义词没有按预期工作



我要搜索的文本是2 marina blvd,由elasticsearch (top 3)返回的结果是:

2 MARINA GREEN, SINGAPORE 019800
MARINA BAYFRONT 2 RAFFLES LINK, SINGAPORE 039392
THE SAIL @ MARINA BAY 2 MARINA BOULEVARD, SINGAPORE 018987

在我的同义词列表中,blvdboulevard是一样的。

当我搜索2 marina blvd时,我期望这个THE SAIL @ MARINA BAY 2 MARINA BOULEVARD, SINGAPORE 018987将是得分最高的一个,因为2 marina blvd等于2 marina boulevard。但现在2 MARINA GREEN, SINGAPORE 019800是最重要的。

哪里出错了,我该如何改善我的搜索结果?

完整设置为:

{
  "geolocation": {
    "settings": {
      "index": {
        "creation_date": "1471322099847",
        "analysis": {
          "filter": {
            "my_synonym_filter": {
              "type": "synonym",
              "synonyms": [
                "rd,road",
                "ave,avenue",
                "blvd,boulevard",
                "st,street",
                "lor,lorong",
                "ter,terminal",
                "blk,block",
                "apt,apartment",
                "condo,condominium"
              ]
            }
          },
          "analyzer": {
            "my_synonyms": {
              "filter": [
                "lowercase",
                "my_synonym_filter"
              ],
              "tokenizer": "standard"
            },
            "stopwords_analyzer": {
              "type": "standard",
              "stopwords": [
                "the"
              ]
            },
            "my_ngram_analyzer": {
              "tokenizer": "my_ngram_tokenizer"
            }
          },
          "tokenizer": {
            "my_ngram_tokenizer": {
              "token_chars": [
                "letter",
                "digit"
              ],
              "min_gram": "2",
              "type": "nGram",
              "max_gram": "5"
            }
          }
        },
        "number_of_shards": "5",
        "number_of_replicas": "1",
        "uuid": "mPfZmWHFQZOHqfAi471nGQ",
        "version": {
          "created": "2030599"
        }
      }
    }
  }
}

这是查询

body: {
      from : 0, size : 10,
      query: {
        bool: {
          should: [
            {
              match: {
                text: q
              }
            },
            {
              match: {
                text: {
                  query: q,
                  fuzziness: 1,
                  prefix_length: 0,
                  max_expansions: 100
                }
              }
            },
            {
              match: {
                text: {
                  query: q,
                  max_expansions: 300,
                  type: "phrase_prefix"
                }
              }
            }
          ]
        }
      }
    }

映射是:

{
  "geolocation": {
    "mappings": {
      "location": {
        "properties": {
          "address": {
            "type": "string"
          },
          "blk": {
            "type": "string"
          },
          "building": {
            "type": "string"
          },
          "location": {
            "type": "geo_point"
          },
          "postalCode": {
            "type": "string"
          },
          "road": {
            "type": "string"
          },
          "searchText": {
            "type": "string"
          },
          "x": {
            "type": "string"
          },
          "y": {
            "type": "string"
          }
        }
      }
    }
  }
}

您定义了分析器,但还没有为您的字段设置任何分析器。最基本的设置是:

"searchText": {
  "type": "string",
  "analyzer":"my_synon‌​yms"
}

一个字段可以有一个用于索引时间的分析器和一个用于搜索时间的分析器。大多数用例通常在索引和搜索时使用相同的分析器。默认情况下(当使用"analyzer": "whatever_analyzer"‌​时),在搜索和索引时使用相同的分析器。

要了解更多的分析和你可以做什么,请咨询https://www.elastic.co/guide/en/elasticsearch/guide/2.x/analysis-intro.html。

最新更新