如何根据'whole words'而不是包含对搜索结果进行排序'Hibernate with Lucene'



我正在使用Hibernate搜索在我们的商店应用程序中提供产品/项目的全文搜索。 以下是我的 Item 类:

@Entity
@Table(name = "items", indexes = {
@Index(name = "idx_item_uuid", columnList = "uuid", unique = true),
@Index(name = "idx_item_gtin", columnList = "gtin", unique = true),
})
@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
@EqualsAndHashCode(onlyExplicitlyIncluded = true, callSuper = true)
@ToString(exclude = {"storeItems"})
@Indexed
@AnalyzerDef(name = "ngram",
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
@TokenFilterDef(factory = StandardFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = StopFilterFactory.class),
@TokenFilterDef(factory = NGramFilterFactory.class,
params = {
@Parameter(name = "minGramSize", value = "1"),
@Parameter(name = "maxGramSize", value = "3")})
}
)
public class Item extends BaseModel {
@Column(nullable = false)
@Field(analyzer = @Analyzer(definition = "ngram"))
private String name;
@OneToMany(orphanRemoval = true, cascade = CascadeType.ALL, mappedBy = "item", fetch = FetchType.EAGER)
@Fetch(FetchMode.SELECT)
private List<Image> images;
@OneToMany(mappedBy = "item", cascade = CascadeType.REFRESH)
@Fetch(FetchMode.SELECT)
@JsonIgnore
@IndexedEmbedded(includePaths = {"store.uuid"})
private Set<StoreItem> storeItems;
@Enumerated(EnumType.STRING)
private QuantityType quantityType;
@Column(nullable = false, length = 14)
private String gtin;
private String articleSize;
@ManyToOne(fetch = FetchType.EAGER)
@JoinColumn(name = "brand_id", foreignKey = @ForeignKey(name = "fk_brands_items"))
private Brand brand;
private String supplierName;
@ManyToOne(fetch = FetchType.EAGER)
@JoinColumn(name = "category_id", foreignKey = @ForeignKey(name = "fk_categories_items"))
@IndexedEmbedded(includePaths = {"uuid"})
private Category category;
private String taxType;
private Double taxRate;
@Lob
private String marketingMessage;
private boolean seasonal;
private String seasonCode;
@Lob
private String nutritionalInformation;
@Lob
private String ingredients;
private Double depth;
private String depthUnit;
private Double height;
private String heightUnit;
private Double width;
private String widthUnit;
private Double netContent;
private String netContentUnit;
private Double grossWeight;
private String grossWeightUnit;
private Double maxStorageTemp;
private Double minStorageTemp;
private Double maxTransportTemp;
private Double minTransportTemp;
private boolean organic;
private String origin;
}

以下是我的自定义存储库如何搜索特定商店中的项目:

@Override
public List<Item> findItemBySearchStrAndStoreUuid(final String searchStr, final String storeUuid) {
final EntityManager entityManager = entityManagerFactory.createEntityManager();
final FullTextEntityManager manager = Search.getFullTextEntityManager(entityManager);
entityManager.getTransaction().begin();
final QueryBuilder qb = manager.getSearchFactory()
.buildQueryBuilder().forEntity(Item.class).get();
final Query query = qb.bool()
.must(qb.keyword().onField("name").matching(searchStr).createQuery())
.must(qb.keyword().onField("storeItems.store.uuid").matching(storeUuid).createQuery())
.createQuery();
return executeQuery(entityManager, manager, query);
}

我们在数据库中有大约 13k 个项目,并且大部分都有瑞典名称,所以当客户用瑞典语"mjölk"搜索牛奶时,应该会弹出与牛奶相关的项目,它们确实会,但排序不是我们想要的方式,例如。

预期成果 :

  1. 雷约尔克
  2. 雷神之锤巧克力
  3. Kokosmjölk

实际结果 :

  1. Kokosmjölk
  2. 雷神之锤巧克力
  3. 雷约尔克

示例可能会使我看起来只需要反转排序,但问题是这不是实际结果的真实情况,它们更加随机,但问题是我需要首先出现牛奶,然后是具有"牛奶"作为整个单词的项目,然后是所有将其作为子字符串的项目。

因此,请指导我如何增强我的分析器/查询以实现这种排序,即使只有一个字符,我也需要给出结果,搜索也应该处理一些拼写错误,因此我在上述设置中使用了 Ngram 过滤器。

另外,我确实尝试使用SwedishLightStemFilterFactory,这确实有所帮助,但是除非有人完全正确地输入"mjölk",否则项目停止显示。

提前谢谢。

您需要在同一属性上声明一个单独的字段,专门用于排序,并为其分配规范化器而不是分析器。

见 https://docs.jboss.org/hibernate/search/5.11/reference/en-US/html_single/#section-normalizers

我会考虑两件事:

  • ASCIIFoldingFilterFactory:将重音字符替换为普通字符
  • 单独的分析器,用于对值未标记且仅小写的排序

休眠中的排序通常涉及不同的策略。

最新更新