ElasticSearch如何在ngram查询中管理分数结果



我的索引中有数百种化学品结果climate_change

我正在使用ngram研究,这是我用于索引的设置。

{
"settings": {
"index.max_ngram_diff": 30,
"index": {
"analysis": {
"analyzer": {
"analyzer": {
"tokenizer": "test_ngram",
"filter": [
"lowercase"
]
},
"search_analyzer": {
"tokenizer": "test_ngram",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"test_ngram": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 30,
"token_chars": [
"letter",
"digit"
]
}
}
}
}
}
}

我的主要问题是,如果我尝试做这样一个查询

GET climate_change/_search?size=1000
{
"query": {
"match": {
"description": {
"query":"oxygen"
}
}
}
}

我看到很多结果都有相同的分数7.381186……但很奇怪

{
"_index" : "climate_change",
"_type" : "_doc",
"_id" : "XXX",
"_score" : 7.381186,
"_source" : {
"recordtype" : "chemicals",
"description" : "carbon/oxygen"
}
},
{
"_index" : "climate_change",
"_type" : "_doc",
"_id" : "YYY",
"_score" : 7.381186,
"_source" : {
"recordtype" : "chemicals",
"description" : "oxygen"
}

这怎么可能?在上面的例子中,如果我使用ngram,并且在描述字段中搜索氧气,我预计第二个结果的分数将比第一个结果大。我还试图指定标记化器"的类型标准";以及">空白";在设置中,但这无济于事。也许描述中有">/"字符?

非常感谢!

您还需要在description字段的映射中定义分析器。

添加一个具有索引数据、映射、搜索查询和搜索结果的工作示例

{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "test_ngram",
"filter": [
"lowercase"
]
},
"search_analyzer": {
"tokenizer": "test_ngram",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"test_ngram": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 30,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}

指数数据:

{
"recordtype": "chemicals",
"description": "carbon/oxygen"
}
{
"recordtype": "chemicals",
"description": "oxygen"
}

搜索查询:

{
"query": {
"match": {
"description": {
"query":"oxygen"
}
}
}
}

搜索结果:

"hits": [
{
"_index": "67180160",
"_type": "_doc",
"_id": "2",
"_score": 0.89246297,
"_source": {
"recordtype": "chemicals",
"description": "oxygen"
}
},
{
"_index": "67180160",
"_type": "_doc",
"_id": "1",
"_score": 0.6651374,
"_source": {
"recordtype": "chemicals",
"description": "carbon/oxygen"
}
}
]

最新更新