Elasticsearch基于查询中不重要的单词返回不太相关的结果



我正在使用Elasticsearch构建一个小型网络搜索引擎。我正在使用以下查询;

{
"query": {
"bool": {
"should": [
{
"match": {
"title": {
"query": "how to format code better golang",
"boost": 3,
"fuzziness": "AUTO"
}
}
},
{
"match": {
"keywords": {
"query": "how to format code better golang",
"boost": 2,
"fuzziness": "AUTO"
}
}
},
{
"match": {
"description": {
"query": "how to format code better golang",
"boost": 1,
"fuzziness": "AUTO"
}
}
}
]
}
}
}

当我运行它时,这是前两个结果(它们在查询后被编辑,但分数/位置没有被篡改(:

{
"id": "7a8a9b4b96c05460f32d18bba0804fdf",
"score": 4651,
"meta": {
"url": "https://www.youtube.com/watch?v=IhC7sdYe-Jg",
"title": "How a Compiler Works in ~1 minute - YouTube",
"description": "A quick video explaining what a compiler does and how it works. The simple compiler I wrote is available in GitHub: http://www.github.com/charles-l/koona.Red...",
"keywords": "tutorial, Compiler (Software Genre), compiler, computer, code, language, programming language, clang, gcc, lexer, parser, generator, ruby, how to write a compiler"
}
},
{
"id": "59c42e9f27efc9eea64b25d31d8146d1",
"score": 4224,
"meta": {
"url": "https://dev.to/ksingh7/golang-automatic-code-formatting-code-like-a-pro-205a",
"title": "Golang automatic code formatting : Code like a Pro - DEV Community",
"description": "Why Format your code?   Everyone loves clean readable and beautifully organized code using... Tagged with go, formatting, vscode.",
"keywords": "go, formatting, vscode, software, coding, development, engineering, inclusive, community"
}
}

当然,我预计第二个结果会比第一个结果更有意义。但事实并非如此。我尝试了几个不同的查询,但几乎在我尝试的每个查询中,我想名列前茅的结果总是第二个或更多。有时它确实会出现在顶部,但如果我加上";在";对于查询(例如"如何在golang中更好地格式化代码"(,它将再次成为第二个。

有什么方法可以让结果更具相关性吗?

由于您没有共享索引映射和设置,并且主要使用默认分析器(standard(,该分析器不会删除英文stop worlds,如thisishow等,即在您的情况下不重要的术语。要解决这个问题,您需要使用english分析器,它将在索引和查询时删除这些术语,并为第二个文档提供更好的分数。

例如:-

POST/我的索引

{
"mappings" :{
"properties" : {
"title" : {
"type": "text",
"analyzer" : "english"
},
"description" : {
"type": "text",
"analyzer" : "english"
},
"keywords" : {
"type": "text",
"analyzer" : "english" // note english analyzer on all the fields
}
}
}
}
  1. 为两个样本文档编制索引。

  2. 同样的搜索为我产生了下面的结果。

"hits": [
{
"_index": "71413449",
"_type": "_doc",
"_id": "2",
"_score": 10.55508,
"_source": {
"title": "Golang automatic code formatting : Code like a Pro - DEV Community",
"description": "Why Format your code?   Everyone loves clean readable and beautifully organized code using... Tagged with go, formatting, vscode.",
"keywords": "go, formatting, vscode, software, coding, development, engineering, inclusive, community"
}
},
{
"_index": "71413449",
"_type": "_doc",
"_id": "1",
"_score": 5.1878767,
"_source": {
"title": "How a Compiler Works in ~1 minute - YouTube",
"description": "A quick video explaining what a compiler does and how it works. The simple compiler I wrote is available in GitHub: http://www.github.com/charles-l/koona.Red...",
"keywords": "tutorial, Compiler (Software Genre), compiler, computer, code, language, programming language, clang, gcc, lexer, parser, generator, ruby, how to write a compiler"
}
}
]

您现在可以注意到,您的第二份文档得分几乎是第一份文档的两倍

最新更新