我已经索引了这个实体
@Entity
@Indexed
public class MyBean {
@Id
private Long id;
@Field
private String foo;
@Field
private String bar;
@Field
private String baz;
}
对于此架构:
+----+-------------+-------------+-------------+
| id | foo | bar | baz |
+----+-------------+-------------+-------------+
| 11 | an example | ignore this | ignore this |
| 12 | ignore this | an e.x.a.m. | ignore this |
| 13 | not this | not this | not this |
+----+-------------+-------------+-------------+
我需要通过搜索exam
来找到11
和12
。
我尝试过:
FullTextEntityManager fullTextEntityManager =
Search.getFullTextEntityManager(this.entityManager);
QueryBuilder queryBuilder = fullTextEntityManager.getSearchFactory()
.buildQueryBuilder().forEntity(MyBean.class).get();
Query textQuery = queryBuilder.keyword()
.onFields("foo", "bar", "baz").matching("exam").createQuery();
fullTextEntityManager.createFullTextQuery(textQuery, MyBean.class).getResultList();
但这只找到实体11
,我还需要12
.这种可能吗?
将
带有 CATENATE_ALL
标志的WordDelimiterFilter
添加到分析链中,将是一个可能的解决方案。
因此,基于 StandardAnalyzer
的分析器实现如下所示:
public class StandardWithWordDelim extends StopwordAnalyzerBase{
public static final CharArraySet STOP_WORDS_SET = StopAnalyzer.ENGLISH_STOP_WORDS_SET;
public StandardWithWordDelim() {
}
@Override
protected TokenStreamComponents createComponents(final String fieldName) {
StandardTokenizer src = new StandardTokenizer();
src.setMaxTokenLength(255);
TokenStream filter = new StandardFilter(src);
filter = new LowerCaseFilter(filter);
filter = new StopFilter(filter, stopwords);
//I'm inclined to add it here, so the abbreviation "t.h.e." doesn't get whacked by the StopFilter.
filter = new WordDelimiterFilter(filter, WordDelimiterFilter.CATENATE_ALL, null);
return new TokenStreamComponents(src, filter);
}
}
看起来您不像正在使用标准分析器(也许是NGrams?),但您应该能够在某处将其纳入分析中。