谁能解释一下SOLR中的停顿词是如何工作的?在stopword.txt
中,我定义了of
。在schema.xml
中,我有
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"enablePositionIncrements="true"/>
现在当我搜索任何包含单词of
的东西时,结果中没有显示。
示例: oil of olay
显示没有结果,而oil olay
显示正确的结果。
更多的文件定义:
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="1"
preserveOriginal="1"
splitOnCaseChange="0"
splitOnNumerics="0"
types="wdtypes.txt"
/>
<filter class="solr.KeywordRepeatFilterFactory"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
<filter class="solr.TrimFilterFactory" updateOffsets="false"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="1"
preserveOriginal="1"
splitOnCaseChange="0"
splitOnNumerics="0"
types="wdtypes.txt"
/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
当调试:+ (upclist:奶油+ + wheat& qt = productresults&行= 10,fq = % 3 aactive&地位;fq = facilitystatus % 3 aactive& fq = facilityid % 3 a100& fq = inventoryctrlcode % % 5 b0 + 3 + 100% 5 d& fq 3 = weblifecycle % % 283 +或者+ 4% 29,fq = groupnumber % 3 a2 ^ 1.2 |关键词:奶油+ + wheat& qt = productresults&行= 10,fq = % 3 aactive&地位;fq = facilitystatus % 3 aactive& fq = facilityid % 3 a100& fq = inventoryctrlcode % % 5 b0 + 3 + 100% 5 d& fq 3 = weblifecycle % % 283 +或者+ 4% 29,fq = groupnumber % 3 a2 ^ 20.0 |product_elevate:奶油+ + wheat& qt = productresults&行= 10,fq = % 3 aactive&地位;fq = facilitystatus % 3 aactive& fq = facilityid % 3 a100& fq = inventoryctrlcode % % 5 b0 + 3 + 100% 5 d& fq 3 = weblifecycle % % 283 +或者+ 4% 29,fq = groupnumber % 3 a2 ^ 5.0 |面积:"(奶油+ + wheat& qt = productresults&行= 10,fq = % 3 aactive&地位;fq = facilitystatus % 3 aactive& fq = facilityid % 3 a100& fq = inventoryctrlcode % % 5 b0 + 3 + 100% 5 d& fq 3 = weblifecycle % % 283 +或者+ 4% 29,fq = groupnumber % 3 a2奶油)的小麦Qt productresultsrow(行creamofwheatqtproductresultsrow) 10 fqstatus%3aactivefqfacilitystatus%3aactivefqfacilityid%3a100fqinventoryctrlcode%3a%5b0(到fqstatus%3aactivefqfacilitystatus%3aactivefqfacilityid%3a100fqinventoryctrlcode%3a%5b0到)100%5d fqweblifecycle%3a%283(或fqweblifecycle%3a%283or) 4%29 fq (groupnumber%3a2 fqgroupnumber%3a2)creamofwheatqtproductresultsrows10fqstatus % 3 3 aactivefqfacilitystatus % 3 aactivefqfacilityid % 3 a100fqinventoryctrlcode % % 5 b0to100 % 5 dfqweblifecycle % % 283 4 % 29 fqgroupnumber % 3 a2)"2.5 ~ 3 ^ | productid:奶油+ + wheat& qt = productresults&行= 10,fq = % 3 aactive&地位;fq = facilitystatus % 3 aactive& fq = facilityid % 3 a100& fq = inventoryctrlcode % % 5 b0 + 3 + 100% 5 d& fq 3 = weblifecycle % % 283 +或者+ 4% 29,fq = groupnumber % 3 a2 ^ 1.7 |productname:奶油+ + wheat& qt = productresults&行= 10,fq = % 3 aactive&地位;fq = facilitystatus % 3 aactive& fq = facilityid % 3 a100& fq = inventoryctrlcode % % 5 b0 + 3 + 100% 5 d& fq 3 = weblifecycle % % 283 +或者+ 4% 29,fq = groupnumber % 3 10.0 a2 ^) ~ 0.01 ()
这可能不相关,因为你说你只搜索一个字段(我张贴它,因为你说你正在使用edismax和qf)。当我想要提升精确搜索时,我也遇到了类似的问题,所以我将qf制作成这样:<str name="qf">title^45 title_str^55
。标题字段使用了停止词,而title_str显然没有。这里描述了它经常找不到使用停词的搜索的原因。他们的解决方案是摆弄mm值。在我的例子中,有效的解决方案是将title_str放在pf标记中(并将其从qf标记中删除),因此确切的查找结果将出现在顶部。
最终解决了这个问题:
"mm" from 2<-25% To 2<-36%