ElasticSearch 分析器可匹配"Java"、"Script"和"JavaScript"



索引值:Java, JavaScript, ClojureScript

_input_    | _output_
Java       | JavaScript, Java
JavaScript | JavaScript
script     | JavaScript, ClojureScript

最接近所需结果的分析器如下。

"analysis": {
"filter": {
"trigrams_filter": {
"type": "edge_ngram",
"min_gram": "3",
"max_gram": "3"
}
},
"analyzer": {
"trigrams": {
"filter": [
"lowercase",
"trigrams_filter"
],
"type": "custom",
"tokenizer": "standard"
}
}
}

但它还不够准确,因为"JavaScript"返回"JavaScript"one_answers"Java"script"不返回任何内容。

映射有一个主要问题:您想使用edge_ngram过滤器来搜索单词的一部分。Edge_ngram筛选器用于查找以查询值开头的单词。在您的情况下,您应该使用nGram过滤器:https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenfilter.html

此外,您应该只在数据为索引时指定三元图分析器。对于搜索,最好使用标准分析器,因为将查询字符串放入nGram过滤器是没有意义的,因为您将获得比所需更多的数据。

正确映射:

POST /so
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"trigrams_filter": {
"type": "nGram",
"min_gram": "2",
"max_gram": "20"
}
},
"analyzer": {
"trigrams": {
"filter": [
"lowercase",
"trigrams_filter"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
},
"mappings": {
"so" :{
"properties": {
"text": {
"type": "string",
"analyzer": "trigrams",
"search_analyzer": "standard"
}
}
}
}
}

值:

POST /so/so/1
{
"text" :"Java"
}
POST /so/so/2
{
"text" :"JavaScript"
}
POST /so/so/3
{
"text" :"ClojureScript"
}

当您的查询字符串为"java"时,响应包含:java和JavaScript

POST /so/so/_search
{
"query": {"match": {
"text": "Java"
}}
}

响应:

{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "so",
"_type": "so",
"_id": "1",
"_score": 1,
"_source": {
"text": "Java"
}
},
{
"_index": "so",
"_type": "so",
"_id": "2",
"_score": 1,
"_source": {
"text": "JavaScript"
}
}
]
}
}

当您的查询字符串为"JavaScript"时,响应包含:JavaScript

POST /so/so/_search
{
"query": {"match": {
"text": " JavaScript "
}}
}

响应:

{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.4054651,
"hits": [
{
"_index": "so",
"_type": "so",
"_id": "2",
"_score": 1.4054651,
"_source": {
"text": "JavaScript"
}
}
]
}
}

当您的查询字符串是"script"时,响应包含:JavaScript和ClojureScript

POST /so/so/_search
{
"query": {"match": {
"text": "script"
}}
}

响应:

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "so",
"_type": "so",
"_id": "2",
"_score": 1,
"_source": {
"text": "JavaScript"
}
},
{
"_index": "so",
"_type": "so",
"_id": "3",
"_score": 1,
"_source": {
"text": "ClojureScript"
}
}
]
}
}

最新更新