在弹性搜索中过滤与单个字符匹配的结果



在我的弹性搜索结果中,即使匹配一个字符,也会给出结果。当我们看到个位数的结果时,结果看起来很奇怪。

有没有通过DSL查询过滤掉匹配个位数/字符的结果。

当前查询:

GET /attachment_index/_search
{
"_source": [
"user_email_id",
"file_content_id",
"file_name",
"non_indexed_meta_data"
],
"query": {
"bool": {
"must": [
{
"has_child": {
"type": "user_email_id",
"query": {
"match": {
"user_email_id": "test@user.com"
}
},
"inner_hits": {}
}
},
{
"match": {
"attachment.content": {
"query": "mark twain 3",
"analyzer": "english", 
"operator": "or"
}
}
}
]
}
},
"highlight": {
"order": "score",
"pre_tags": [
"<strong>"
],
"post_tags": [
"</strong>"
],
"fields": {
"attachment.content": {}
}
},
"size": 100
}

它给出了我不想要的3个匹配的结果。在输入到弹性搜索之前,对长度进行过滤而不进行预处理有什么想法吗?

可以使用自定义分析器根据长度进行筛选。Elasticsearch文档包含一个如何重建英语分析器的示例,以便我们可以在那里添加最小长度过滤器,例如

PUT /attachment_index
{
"settings": {
"analysis": {
"filter": {
"english_stop": {
"type":       "stop",
"stopwords":  "_english_"
},
"english_stemmer": {
"type":       "stemmer",
"language":   "english"
},
"english_possessive_stemmer": {
"type":       "stemmer",
"language":   "possessive_english"
},
"length": {
"type": "length",
"min": 2
}
},
"analyzer": {
"length_english": {
"tokenizer":  "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_stemmer",
"length"
]
}
}
}
}
}

尝试一下:

GET attachment_index/_analyze
{
"analyzer": "length_english",
"text": "mark twain 3"
}

返回

{
"tokens" : [
{
"token" : "mark",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "twain",
"start_offset" : 5,
"end_offset" : 10,
"type" : "<ALPHANUM>",
"position" : 1
}
]
}

因此CCD_ 1按要求被过滤掉。在匹配查询中可以使用分析器length_english代替english

最新更新