如果我使用特殊字符进行搜索，Lucene不会返回结果

我使用的是Lucene 6.6.0版本，我使用StandardAnalyzer对数据进行索引。

我正在为以下单词数据编制索引。

a&e网络
a&e

索引后，当我使用&e它不会返回任何结果。这是我的示例代码。

Directory dir = new RAMDirectory();
IndexWriterConfig iwc = new IndexWriterConfig(new StandardAnalyzer());
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
IndexWriter writer = new IndexWriter(dir, iwc);
Document doc = new Document();
doc.add(new TextField("text", "a&e networks", Field.Store.YES));
writer.addDocument(doc);
doc = new Document();
doc.add(new TextField("text", "a&e", Field.Store.YES));
writer.addDocument(doc);
writer.close();
IndexReader reader = DirectoryReader.open(dir);
IndexSearcher searcher = new IndexSearcher(reader);
Query query = new TermQuery(new Term("text", "a&e"));
TopDocs results = searcher.search(query, 5);
final ScoreDoc[] scoreDocs = results.scoreDocs;
for (ScoreDoc scoreDoc : scoreDocs) {
System.out.println(scoreDoc.doc + " " + scoreDoc.score + " " + searcher.doc(scoreDoc.doc).get("text"));
}
System.out.println("Hits: " + results.totalHits);
System.out.println("Max score:" + results.getMaxScore());

我的输出为点击次数：0最大得分：NaN

即使我在搜索a，在这种情况下也没有给出任何结果。

但是如果我像这个一样在StandardAnalyzer中添加停止字集

List<String> stopWords = Arrays.asList("&");
CharArraySet stopSet = new CharArraySet(stopWords, false);
IndexWriterConfig iwc = new IndexWriterConfig(new StandardAnalyzer(stopSet));

之后，如果我搜索a，那么我就能得到结果。但即使在这种情况下，如果我搜索a&e，则我没有得到任何结果。

请建议我如何实现这一点，我的目标是如果我搜索a&e我应该能够得到结果。我需要任何CustomAnalyzer吗？如果是，请解释我应该在CustomAnalyzer中添加什么？

可能&字符被视为单词边界：

https://lucene.apache.org/core/6_6_0/core/org/apache/lucene/analysis/standard/StandardTokenizer.html

此类实现Unicode文本分割算法中的分词规则，如Unicode标准附录#29中所指定。

a和e可能被认为是停止字。因此，当索引时，它们会被删除。

您可以尝试一些由&字符分隔的随机生成的关键字(例如adsadaerewfds&eqwedasd(。索引后，尝试在&之前和之后搜索关键字。如果找到了这些关键字，则在不进行分析的情况下存储它们(可以使用StringField(，或者创建自定义分析器。

相关内容

最新更新

热门标签：