为什么我无法搜索 Solr 字段中的第一个单词或最后一个单词?

我是Solr初学者，我只是在我的项目中使用了1个月，从第一次开始，一切都很好，但我遇到了问题。如果我有这样一句话"当你爱一个人时，世界是闪耀的"。如果我使用"当你"或"闪耀时"，则没有结果，但是当我尝试使用"你爱"或"世界是"，或者只是"爱"或某种时，结果就会出现。我想问一下如何通过schemal.xml文件进行配置，还是我做错了什么，谢谢！

这是架构.xml文件

<?xml version="1.0" encoding="UTF-8"?>
<schema name="minimal" version="1.1">
<field name="_version_" type="long" indexed="true" stored="false" />
<field name="_root_" type="string" indexed="true" stored="false" docValues="false" />
<uniqueKey>id</uniqueKey>
<solrQueryParser defaultOperator="AND"/>
<field name="dplname" type="text_general" multiValued="false" indexed="true" required="true" stored="true"/>
<field name="mail" type="text_general" indexed="true" stored="true"  multiValued="true"/>
<field name="phone" type="text_general" indexed="true" stored="true"/>
<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
<field name="_text_" type="text_general" indexed="true" stored="false" multiValued="true"/>
<copyField source="dplname" dest="text"/>
<copyField source="mail" dest="text"/>
<copyField source="phone" dest="text"/>
<fieldType name="int" class="solr.TrieIntField" docValues="true" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" docValues="true" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" docValues="true" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="double" class="solr.TrieDoubleField" docValues="true" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="string" class="solr.StrField" sortMissingLast="true" />
<fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
</schema>

更新：我使用此查询来搜索：dplname：是闪亮的还是有点。

好的，所以你需要了解如何在Solr中分析和标记文本。在您的情况下，如果您查看架构.xml

<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>

这意味着在索引文档时，将应用StandardTokenizerFactory，这会破坏基于空格和其他一些分隔符的句子。

有关详细信息，请阅读此处 https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers-StandardTokenizer)

例如你的句子：

当你爱一个人时，世界在闪耀

将分为以下令牌

当，你，爱，某人，那个，世界，是，闪耀

所以总共 8 个代币。注意,也将被删除，因为这也是一个分隔符。

然后应用StopFilterFactory过滤器，这将删除停用词.txt文件中存在的停用词。 (停用词是您不想索引的常用词，因为它们在搜索中没有意义。

在这里阅读 https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-StopFilter)

让我们假设停用词是

你，那个，是

所以在第二个过滤器之后，你留下了这些标记(因为停用词被删除了)

当，爱，某人，世界，闪耀

现在第三个过滤器是小写过滤器，它将所有标记转换为小写。

所以总结一下什么时候都说了，做了你的句子

当你爱一个人时，世界在闪耀

被索引到后续令牌中

当，爱，某人，世界，闪耀

让我们谈谈搜索又名查询

在您的架构中.xml您有以下内容

<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>

这意味着将针对每个查询执行上述分析器。

因此，当您搜索dplname:shining标准令牌器工厂将对其进行分析，因为没有分隔符，shining不会发生任何事情，因为它也不是停用词，也不会被StopFilterFactory删除，LowerCaseFilterFactory只会将其更改为小写。(如果已经没有)

因此，Solr将搜索的最终令牌是shining，它在索引中找到，因此您可以返回结果。

让我们看一下另一个查询

dplname：is shineing

注意：该字段仅对其前面的术语有效，因此在上面的查询中isdplname字段中搜索，但由于shining前面没有任何内容，因此将在默认字段中搜索(在本例中为文本字段)。

所以本质上查询变成了(因为默认运算符是 AND 它将被添加到查询中)

dplname：is AND 文本：Shining

因此，Solr正在寻找一个在dplname字段中有is并在文本字段中shining的文档。它找不到。

阅读此处的查询解析：http://lucene.apache.org/core/2_9_4/queryparsersyntax.html

相关内容

最新更新

热门标签：