如何向Neo4j全文分析器添加额外的停止词



我正在做全文搜索,需要添加到停止词列表中。

在ElasticSearch中:如何将停止词添加到ElasticSearch的默认列表

如果不编写自定义分析器作为插件,这是可能的吗?我的索引是这样的:

CREATE FULLTEXT INDEX productNameIndex FOR (n:Product) ON EACH [n.name] 
OPTIONS {indexConfig: {`fulltext.analyzer`: 'danish' }}

是否可以这样做:fulltext.stopwords: ['word1', 'word2']fulltext.stopwords: ./stopwords.txt?

我以前没有尝试过为Neo4j编写自定义插件,但这似乎相当令人生畏。

坦白地说,不编写插件就不可能编写自定义分析器。幸运的是,您可以为自定义分析器复制一些现有代码。例如,您可以查看covid - graph自定义分析器:

https://github.com/covidgraph/neo4j-additional-analyzers

@Service.Implementation(AnalyzerProvider.class)
public class SynonymAnalyzerProvider extends AnalyzerProvider {
public static final String DESCRIPTION = "analyzer using synonyms";
public static final String ANALYZER_NAME = "synonym";
public SynonymAnalyzerProvider() {
super(ANALYZER_NAME, new String[0]);
}
public Analyzer createAnalyzer() {
try {
return CustomAnalyzer.builder()
.withTokenizer(WhitespaceTokenizerFactory.class)
.addTokenFilter(SynonymFilterFactory.class, "synonyms", "gene_symbols.txt", "ignoreCase", "true")
.addTokenFilter(StopFilterFactory.class, "format", "snowball", "words", "org/apache/lucene/analysis/snowball/english_stop.txt,org/apache/lucene/analysis/snowball/german_stop.txt", "ignoreCase", "true")
.addTokenFilter(LowerCaseFilterFactory.class)
.build();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
@Override
public String description() {
return DESCRIPTION;
}
}

具体来说,似乎是在下一行中添加了停止词:

.addTokenFilter(StopFilterFactory.class, "format", "snowball", "words", "org/apache/lucene/analysis/snowball/english_stop.txt,org/apache/lucene/analysis/snowball/german_stop.txt", "ignoreCase", "true")

最新更新