我有一个SynonymFilterFactory使用同义词文件。来自Solr文档:
#Explicit mappings match any token sequence on the LHS of "=>"
#and replace with all alternatives on the RHS. These types of mappings
#ignore the expand parameter in the schema.
#Examples:
i-pod, i pod => ipod,
sea biscuit, sea biscit => seabiscuit
然而,当查询sea biscuit
时,我最终得到与sea
, biscuit
和seabiscuit
相关的结果。
这是,就好像我有以下配置(使用expand="true"
):
sea biscuit, sea biscit, seabiscuit
我不理解这种行为,因为在Solr分析工具中,当查询sea biscuit
时,它被seabiscuit
正确地替换了。
换句话说:与=>
的显式同义词映射不起作用。
编辑:字段配置
Tokenized: true
类名:org.apache.solr.schema.TextField
Index Analyzer: org.apache.solr.analysis.TokenizerChain
- Tokenizer Class:
org.apache.solr.analysis.WhitespaceTokenizerFactory
过滤器:
org.apache.solr.analysis.StopFilterFactory args:{enablePositionIncrements: true words: stopwords.txt ignoreCase: true }
org.apache.solr.analysis.WordDelimiterFilterFactory args:{preserveOriginal: 1 catenateWords: 1 catenateNumbers: 1 splitOnCaseChange: 1 catenateAll: 0 generateNumberParts: 1 generateWordParts: 1 }
org.apache.solr.analysis.LowerCaseFilterFactory args:{}
org.apache.solr.analysis.SnowballPorterFilterFactory args:{protected: protwords.txt }
org.apache.solr.analysis.LengthFilterFactory args:{min: 2 max: 500 }
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
org.apache.solr.analysis.ASCIIFoldingFilterFactory args:{}
查询分析器:org.apache.solr.analysis.TokenizerChain
- Tokenizer Class:
org.apache.solr.analysis.WhitespaceTokenizerFactory
过滤器:
org.apache.solr.analysis.LowerCaseFilterFactory args:{}
org.apache.solr.analysis.SynonymFilterFactory args:{expand: true ignoreCase: true synonyms: synonyms.txt }
org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true }
org.apache.solr.analysis.WordDelimiterFilterFactory args:{preserveOriginal: 1 catenateWords: 0 catenateNumbers: 0 splitOnCaseChange: 1 catenateAll: 0 generateNumberParts: 1 generateWordParts: 1 }
org.apache.solr.analysis.SnowballPorterFilterFactory args:{protected: protwords.txt }
org.apache.solr.analysis.LengthFilterFactory args:{min: 2 max: 500 }
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
org.apache.solr.analysis.ASCIIFoldingFilterFactory args:{}
SynonymFilterFactory已弃用,现在应替换为SynonymGraphFilterFactory。当多个标记存在于同一位置时,它会压缩标记并修复多词同义词的问题。
你正在做一个短语查询(使用双引号)吗?如果不是,您将给SynonymFilter两个不同的令牌(sea和biscuit)。在这种情况下,没有找到匹配的同义词。
顺便说一下,在索引时处理同义词几乎总是一个更好的主意。看这里:http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory