我正在进行搜索查询时忽略重音符号和复数/单数。我从这里复制了西班牙语分析器,只留下了stemmerhttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html
你可以在Python中检查我的代码(我从CSV后者批量处理数据(:
settings={
"settings": {
"analysis": {
"filter": {
"spanish_stemmer": {
"type": "stemmer",
"language": "light_spanish"
}
},
"analyzer": {
"rebuilt_spanish": {
"tokenizer": "standard",
"filter": [
"lowercase",
"spanish_stemmer"
]
}
}
}
}
}
es.indices.create(index="activities", body=settings)
然而,当我尝试从失眠(如geometrico
、geométrico
、geométricos
、geometricos
(中进行GET查询时,我得到0个结果,并且有一个标题为Cuerpos geométricos
的文档。它应该匹配,因为我不想在重音和复数单数上有区别。有什么想法吗?
我做的GET查询:
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "geométricos",
"fields": [
"Descripcion",
"Nombre",
"Tags"
],
"analyzer":"rebuilt_spanish"
}
}
}
}
}
您需要将ASCII folding token filter
添加到令牌过滤器中,请查看此处的官方文档。所以你的Analyzer
应该是这样的:
安莱泽:
"analysis": {
"filter": {
"spanish_stemmer": {
"type": "stemmer",
"language": "light_spanish"
}
},
"analyzer": {
"rebuilt_spanish": {
"tokenizer": "standard",
"filter": [
"asciifolding", // ASCII folding token filter
"lowercase",
"spanish_stemmer"
]
}
}
}
}