我的solr模式如下(仅重要部分):
<fieldType name="bagofwords_expertfinding" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<!-- remove letters repeated more than two times -->
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords_en.txt"
enablePositionIncrements="true"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="^[0-9-/_,.]+$" replacement="" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="^.*(([aA-zZ])\2)\2+.*$" replacement=""/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.LengthFilterFactory" min="3" max="100"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords_en.txt"
enablePositionIncrements="true"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="^[0-9-/_,.]+$" replacement="" replace="all"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.LengthFilterFactory" min="3" max="100"/>
</analyzer>
</fieldType>
<fieldType name="namedentities_expertfinding" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<!-- remove letters repeated more than two times -->
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="s," replacement=","/>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern=",s" replacement=","/>
<tokenizer class="solr.PatternTokenizerFactory" pattern="," />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords_en.txt"
enablePositionIncrements="true"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="^[0-9-/_,.]+$" replacement="" replace="all"/>
<filter class="solr.LengthFilterFactory" min="3" max="100"/>
</analyzer>
</fieldType>
在名字识别中,我索引了多个单词,比如:"diego alberto milito","diego armando maradona"。我尝试在这两个字段中进行搜索,并使用dismax查询以不同的方式增强它们。
但是尝试使用此查询:localhost:8080/solr/select/?q="迪戈·阿曼多·马拉多纳"&defType=dismax&qf=名称标识^100袋单词^1&fl=*,分数&debugQuery=true&mm=0
索尔什么也没找到。也许我不明白"符号"的正确用法。
我不明白也从solr维基给出了这个:
"在Solr 1.4及以前版本中,如果您希望q.op=OR的等价性,则基本上应设置mm=0,如果您想要q.op=and的等价性则应设置mm=100%。在3.x和trunk中,mm的默认值由q.op参数决定(q.op=and=>mm=100%;q.op=OR=>mm=0%)。请记住,默认运算符受schema.xml项影响。在旧版本的Solr中,默认值为100%(所有条款必须匹配)"
在我的模式中,defaultOperator是OR,为什么在不设置mm=0的情况下,我获得了默认的mm值100。
提前感谢!
在上面的查询字符串周围加引号就是强制进行短语查询。这意味着只考虑精确匹配。去掉它们,用parens代替,并用pf、pf2和pf3参数进行实验,以增加更长的匹配短语。