假设我的elasticsearch中有三个文档。例:
1: {
"name": "test_2602"
}
2: {
"name": "test-2602"
}
3: {
"name": "test 2602"
}
现在当我使用模糊匹配查询搜索它时,如下所示
{
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"match": {
"name": {
"query": "test-2602",
"fuzziness": "2",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"boost": 1
}
}
}
],
"disable_coord": false,
"adjust_pure_negative": true,
"boost": 1
}
}
],
"disable_coord": false,
"adjust_pure_negative": true,
"boost": 1
}
}
}
作为响应,我只得到两个文档,这是(即使我按名称值搜索=>test";test 2602"或"测试- 2602")
{
"name": "test-2602"
},
{
"name": "test 2602"
}
我没有得到名称为"test_2602"(与包含下划线的value不匹配)。我希望它包括第三个文档以及名称值为"test_2602"。但是如果我搜索name为test_2602;然后我得到
{
"name": "test_2602"
}
当我搜索name为"test" test 2602", "test-2602"时,我需要获取这三个文档和"test_2602">
您在搜索中只得到两个文档,因为默认情况下elasticsearch使用标准分析器,它将把"test-2602"
和"test 2602"
标记为test
和2602
。但是"test_2602"
不会被标记化。
您可以使用analyze API
检查生成的令牌GET /_analyze
{
"analyzer" : "standard",
"text" : "test_2602"
}
生成的令牌将是
{
"tokens": [
{
"token": "test_2602",
"start_offset": 0,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 0
}
]
}
您需要在type字段中添加。keyword。它使用关键字分析器而不是标准分析器(注意".keyword"在名称字段之后)。试试下面的查询-
指数映射:
{
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
搜索查询:
{
"query": {
"match": {
"name.keyword": {
"query": "test_2602",
"fuzziness":2
}
}
}
}
搜索结果:
"hits": [
{
"_index": "66572330",
"_type": "_doc",
"_id": "1",
"_score": 0.9808291,
"_source": {
"name": "test_2602"
}
},
{
"_index": "66572330",
"_type": "_doc",
"_id": "3",
"_score": 0.8718481,
"_source": {
"name": "test 2602"
}
},
{
"_index": "66572330",
"_type": "_doc",
"_id": "2",
"_score": 0.8718481,
"_source": {
"name": "test-2602"
}
}
]