短语建议器在第一个字母拼写错误时返回意外结果

我正在使用Elasticsearch短语建议器来纠正用户的拼写错误。一切都按我预期工作，除非用户输入第一个字母拼写错误的查询。在这种情况下，短语建议器不返回任何内容或返回意外结果。

我的建议查询：

{
"suggest": {
"text": "user_query",
"simple_phrase": {
"phrase": {
"field": "title.phrase",,
"collate": {
"query": { 
"inlile" : {
"bool": {
"should": [
{ "match": {"title": "{{suggestion}}"}},
{ "match": {"participants": "{{suggestion}}"}}
]
}
}
}
}
}
}

} }

首字母拼写错误的示例：

"simple_phrase" : [
{
"text" : "گاشانچی",
"offset" : 0,
"length" : 11,
"options" : [ {
"text" : "گارانتی",
"score" : 0.00253151
}]
}
]

第五个字母拼写错误的示例：

"simple_phrase" : [
{
"text" : "کاشاوچی",
"offset" : 0,
"length" : 11,
"options" : [ {
"text" : "کاشانچی",
"score" : 0.1121
},
{
"text" : "کاشانجی",
"score" : 0.0021
},
{
"text" : "کاشنچی",
"score" : 0.0020
}]
}
]

我希望这两个拼写错误的查询具有相同的建议(我的预期建议是第二个(。怎么了？

PS：我正在将此功能用于波斯语。

我有针对您的问题的解决方案，只需要在您的架构中添加一些字段。

PS：我在弹性搜索方面没有那么多专业知识，但我使用 solr 解决了同样的问题，你也可以在 elasticSearch 中实现相同的方式

创建新的 ngram 字段并复制 ngram 字段中的所有标题名称。

当您触发任何对拼写错误的单词的查询时，您会得到空白结果，然后拆分单词并再次触发相同的查询，您将获得预期的结果。

Example : Suppose user searching for word Akshay but type it as Skshay, then 
create query in below way you will get results as expected hopefully.
I am here giving you solr example same way you can achieve it using 
elasticsearch.
**(ngram:"skshay" OR ngram:"sk" OR  ngram:"ks" OR ngram:"sh" OR ngram:"ha" ngram:"ay")**

我们已经拆分了单词序列，并在字段 ngram 上触发查询。
希望对您有所帮助。

来自 Elasticsearch 文档： https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-suggesters-phrase.html

prefix_length

必须匹配的最小前缀字符数为候选人建议。默认值为 1。增加此数字可以提高拼写检查性能。通常拼写错误不会发生在学期开始。(旧名称"prefix_len"已弃用(

因此，默认情况下，短语建议器假定第一个字符是正确的，因为prefix_length的默认值为 1。

注意：将此值设置为 0 不是一个好方法，因为这会影响性能。您需要使用reverse analyzer我在这篇文章中解释了它，所以请去检查我的答案 Elasticsearch 拼写检查建议，即使第一个字母丢失

关于重复项，您可以使用

skip_duplicates 是否应过滤掉重复的建议(默认为错(。

相关内容

最新更新

热门标签：