我正在使用Elasticsearch构建一个小型网络搜索引擎。我正在使用以下查询;
{
"query": {
"bool": {
"should": [
{
"match": {
"title": {
"query": "how to format code better golang",
"boost": 3,
"fuzziness": "AUTO"
}
}
},
{
"match": {
"keywords": {
"query": "how to format code better golang",
"boost": 2,
"fuzziness": "AUTO"
}
}
},
{
"match": {
"description": {
"query": "how to format code better golang",
"boost": 1,
"fuzziness": "AUTO"
}
}
}
]
}
}
}
当我运行它时,这是前两个结果(它们在查询后被编辑,但分数/位置没有被篡改(:
{
"id": "7a8a9b4b96c05460f32d18bba0804fdf",
"score": 4651,
"meta": {
"url": "https://www.youtube.com/watch?v=IhC7sdYe-Jg",
"title": "How a Compiler Works in ~1 minute - YouTube",
"description": "A quick video explaining what a compiler does and how it works. The simple compiler I wrote is available in GitHub: http://www.github.com/charles-l/koona.Red...",
"keywords": "tutorial, Compiler (Software Genre), compiler, computer, code, language, programming language, clang, gcc, lexer, parser, generator, ruby, how to write a compiler"
}
},
{
"id": "59c42e9f27efc9eea64b25d31d8146d1",
"score": 4224,
"meta": {
"url": "https://dev.to/ksingh7/golang-automatic-code-formatting-code-like-a-pro-205a",
"title": "Golang automatic code formatting : Code like a Pro - DEV Community",
"description": "Why Format your code? Everyone loves clean readable and beautifully organized code using... Tagged with go, formatting, vscode.",
"keywords": "go, formatting, vscode, software, coding, development, engineering, inclusive, community"
}
}
当然,我预计第二个结果会比第一个结果更有意义。但事实并非如此。我尝试了几个不同的查询,但几乎在我尝试的每个查询中,我想名列前茅的结果总是第二个或更多。有时它确实会出现在顶部,但如果我加上";在";对于查询(例如"如何在golang中更好地格式化代码"(,它将再次成为第二个。
有什么方法可以让结果更具相关性吗?
由于您没有共享索引映射和设置,并且主要使用默认分析器(standard
(,该分析器不会删除英文stop worlds
,如this
、is
、how
等,即在您的情况下不重要的术语。要解决这个问题,您需要使用english
分析器,它将在索引和查询时删除这些术语,并为第二个文档提供更好的分数。
例如:-
POST/我的索引
{
"mappings" :{
"properties" : {
"title" : {
"type": "text",
"analyzer" : "english"
},
"description" : {
"type": "text",
"analyzer" : "english"
},
"keywords" : {
"type": "text",
"analyzer" : "english" // note english analyzer on all the fields
}
}
}
}
为两个样本文档编制索引。
同样的搜索为我产生了下面的结果。
"hits": [
{
"_index": "71413449",
"_type": "_doc",
"_id": "2",
"_score": 10.55508,
"_source": {
"title": "Golang automatic code formatting : Code like a Pro - DEV Community",
"description": "Why Format your code? Everyone loves clean readable and beautifully organized code using... Tagged with go, formatting, vscode.",
"keywords": "go, formatting, vscode, software, coding, development, engineering, inclusive, community"
}
},
{
"_index": "71413449",
"_type": "_doc",
"_id": "1",
"_score": 5.1878767,
"_source": {
"title": "How a Compiler Works in ~1 minute - YouTube",
"description": "A quick video explaining what a compiler does and how it works. The simple compiler I wrote is available in GitHub: http://www.github.com/charles-l/koona.Red...",
"keywords": "tutorial, Compiler (Software Genre), compiler, computer, code, language, programming language, clang, gcc, lexer, parser, generator, ruby, how to write a compiler"
}
}
]
您现在可以注意到,您的第二份文档得分几乎是第一份文档的两倍。