如何通过在索引时将数据拆分为多个字段来为分层方面创建Solr模式

我想为我的应用程序实现Solr层次方面，其中Category和SubCategory之间有2级层次结构。我想使用上提到的解决方案http://wiki.apache.org/solr/HierarchicalFaceting#Pivot_Facets链接

扁平数据如下：

Doc#1: NonFic > Law
Doc#2: NonFic > Sci
Doc#3: NonFic > Sci > Phys

在索引时，应该将这些数据拆分为层次结构的每个级别的单独字段。同上。

索引术语

Doc#1: category_level0: NonFic; category_level1: Law
Doc#2: category_level0: NonFic; category_level1: Sci
Doc#3: category_level0: NonFic; category_level1: Sci, category_level2:Phys

那么，有人能提出实现这一点的方法吗？我如何定义Solr模式来实现这一点？在索引时间，我找不到任何关于如上所述拆分数据的参考。

谢谢，

普里扬卡

是否需要将这些单独的字段作为返回文档的一部分显示？在这种情况下，您需要在字段的"存储"版本中使用这些拆分值。如果你只需要在搜索或faceting过程中拥有它们，你可以忽略"存储的"表单，专注于"索引的"表单。

在任何一种情况下，如果需要将一个字段拆分为多个字段，可以使用copyField或UpdateRequestProcessor来完成。

使用copyField，所有字段的"存储"表单都是相同的，但每个字段可以有不同的处理程序，为"索引"部分选择层次结构的不同部分。

使用UpdateRequestProcessor，您可以编写一个自定义字段，它接受一个字段，然后吐出几个字段，每个字段只包含其路径的一部分。您可以自定义一个或两个字段副本，然后在每个字段上使用不同的Regex处理器。

要分割数据，请使用ScriptTransformer，它允许您在配置文件中使用Javascript转换数据。

将以下内容添加到db数据配置中，级别与dataSource和document相同。这定义了一个函数，用于在分隔符>上的字段中拆分字符串，并为每个拆分值添加一个名为category_level0、category_level1，。。。

<script><![CDATA[
    function CategoryPieces(row) {
        var pieces = row.get('ColumnToSplit').split('>');
        for (var i=0; i < pieces.length; i++) {
            row.put('category_level' + i, pieces[i]);
        }
        return row;
    }
]]></script>

然后在主<entity>标记中，添加transformer="script:CategoryPieces"，并将列添加到字段列表中。

<field column="category_level0" name="Category_Level0" />
<field column="category_level1" name="Category_Level1" />

最后，在schema.xml中添加新字段。

<field name="Category_Level0" type="string" indexed="true" stored="true" multiValued="false" />
<field name="Category_Level1" type="string" indexed="true" stored="true" multiValued="false" />

相关内容

最新更新

热门标签：