结合Kayessian交集和Muenchian分组

我有一个非常扁平的文档，其中包含基于Heading项目后的位置的隐含元素组：

<Document>
    <Body>
        ...
        <Heading>Section 1</Heading>
        <Item Id="1.1">Alpha</Item>
        <Item Id="1.1">Bravo</Item>
        ...
        <Heading>Section 2</Heading>
        <Item Id="2.1">Alpha</Item>
        <Item Id="2.1">Bravo</Item>
        ...
    </Body>
</Document>

从这个文档中，我想提取组，但也要过滤每个组中的项目，以获取具有给定标识符的第一个项目。例如，如果有两个ID为"1.1"的项，则输出中只需要第一个项。我打算做额外的处理，将重复项作为第一项的子项包括在内。

为了实现这种分组，我使用了Muenchian分组，其中组的密钥是标识符值：

<xsl:key
    name="ItemsById"
    match="/Document/Body/Item"
    use="@Id"/>

这非常有效，只是有许多Item元素被定义为实例，它们碰巧使用了相同的标识符，并最终出现在密钥中匹配的节点集中。

由于文档中间有一个我关心的范围，我使用Kayessian交叉方法将节点集限制为我感兴趣的文档中的部分：

<xsl:variable
    name="section"
    select="(/Document/Body/Heading[text() = 'Example']
        /following-sibling::*[2]/following-sibling::*)[
    count(. | /Document/Body/Heading[text() = 'Appendix B']
        /preceding-sibling::*) 
    = count(/Document/Body/Heading[text() = 'Appendix B']
        /preceding-sibling::*)
    ]" />

该节点集是两个节点集的交集：Heading"第1节"之后的所有元素（包括标题本身）和Heading"附录B"之前的所有元素。

这与我关心的元素相匹配，但是，由于密钥未经过滤，给定标识符的"第一个"值有时在该节点集之外。我曾尝试在密钥中使用变量，但后来我发现密钥中的匹配有许多限制，这些限制阻止了变量的使用。

以下是完整的源文件：

<Document>
    <Body>
        <Heading>Preamble</Heading>
        <Para>
            Lorem ipsum dolor sit amet, consectetur
            adipiscing elit, sed do eiusmod tempor incididunt
            ut labore et dolore magna aliqua.
        </Para>
        <Heading>Example</Heading>
        <Item Id="1.1">Example Alpha</Item>
        <Item Id="1.1">Example Bravo</Item>
        <Heading>Section 1</Heading>
        <Item Id="1.1">Alpha</Item>
        <Item Id="1.1">Bravo</Item>
        <Item Id="1.2">Charlie</Item>
        <Item Id="1.3">Delta</Item>
        <Item Id="1.3">Echo</Item>
        <Item Id="1.4">Foxtrot</Item>
        <Heading>Section 2</Heading>
        <Item Id="2.1">Alpha</Item>
        <Item Id="2.1">Bravo</Item>
        <Item Id="2.2">Charlie</Item>
        <Item Id="2.3">Delta</Item>
        <Item Id="2.3">Echo</Item>
        <Item Id="2.4">Foxtrot</Item>
        <Heading>Appendix A</Heading>
        <Item Id="A.1">Alpha</Item>
        <Item Id="A.1">Bravo</Item>
        <Item Id="A.2">Charlie</Item>
        <Item Id="A.3">Delta</Item>
        <Item Id="A.3">Echo</Item>
        <Item Id="A.4">Foxtrot</Item>
        <Heading>Appendix B</Heading>
        <Para>
            Lorem ipsum dolor sit amet, consectetur
            adipiscing elit, sed do eiusmod tempor incididunt
            ut labore et dolore magna aliqua.
        </Para>
    </Body>
</Document>

我正在应用以下样式表：

<xsl:stylesheet
    version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes"/>
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>
    <!-- The node-set which covers the wanted section of elements. -->
    <xsl:variable
        name="section"
        select="(/Document/Body/Heading[text() = 'Example']
            /following-sibling::*[2]/following-sibling::*)[
        count(. | /Document/Body/Heading[text() = 'Appendix B']
            /preceding-sibling::*) 
        = count(/Document/Body/Heading[text() = 'Appendix B']
            /preceding-sibling::*)
        ]" />
    <!-- The items keyed by their ID. -->
    <xsl:key
        name="ItemsById"
        match="/Document/Body/Item"
        use="@Id"/>
    <!-- Matches the root to begin the output structure. -->
    <xsl:template match="/">
        <Document>
            <!-- Apply templates to the headings. -->
            <xsl:apply-templates select="$section[local-name() = 'Heading']" />
        </Document>
    </xsl:template>
    <xsl:template match="/Document/Body/Heading">
        <Section>
            <xsl:attribute name="Title">
                <xsl:value-of select="."/>
            </xsl:attribute>
            <xsl:variable
                name="heading"
                select="generate-id()" />
            <!-- Apply templates to the items in this set. -->
            <xsl:apply-templates
                select="$section[
                local-name() = 'Item'
                and
                generate-id() = generate-id(key('ItemsById', @Id)[1])
                and
                $heading = generate-id(preceding-sibling::Heading[1])
                ]" />
        </Section>
    </xsl:template>
</xsl:stylesheet>

这是当前输出：

<Document>
  <Section Title="Section 1">
    <Item Id="1.2">Charlie</Item>
    <Item Id="1.3">Delta</Item>
    <Item Id="1.4">Foxtrot</Item>
  </Section>
  <Section Title="Section 2">
    <Item Id="2.1">Alpha</Item>
    <Item Id="2.2">Charlie</Item>
    <Item Id="2.3">Delta</Item>
    <Item Id="2.4">Foxtrot</Item>
  </Section>
  <Section Title="Appendix A">
    <Item Id="A.1">Alpha</Item>
    <Item Id="A.2">Charlie</Item>
    <Item Id="A.3">Delta</Item>
    <Item Id="A.4">Foxtrot</Item>
  </Section>
</Document>

问题是第1节中缺少第1.1项。

在我感兴趣的部分，我可以尝试实现相同的分组吗？

这难道不能（简单得多）吗？例如，以下样式表：

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:key name="item-by-heading" match="Item" use="generate-id(preceding-sibling::Heading[1])" />
<xsl:key name="item-by-id" match="Item" use="concat(generate-id(preceding-sibling::Heading[1]), '|', @Id)" />
<xsl:template match="/Document">
    <xsl:copy>
        <xsl:apply-templates select="Body/Heading"/>
    </xsl:copy>
</xsl:template>
<xsl:template match="Heading">
    <Section Title="{.}">
        <xsl:copy-of select="key('item-by-heading', generate-id())[count(. | key('item-by-id', concat(generate-id(preceding-sibling::Heading[1]), '|', @Id))[1]) = 1]"/>
    </Section>
</xsl:template> 
</xsl:stylesheet>

当应用于您的输入时，将返回：

<?xml version="1.0" encoding="UTF-8"?>
<Document>
   <Section Title="Preamble"/>
   <Section Title="Example">
      <Item Id="1.1">Example Alpha</Item>
   </Section>
   <Section Title="Section 1">
      <Item Id="1.1">Alpha</Item>
      <Item Id="1.2">Charlie</Item>
      <Item Id="1.3">Delta</Item>
      <Item Id="1.4">Foxtrot</Item>
   </Section>
   <Section Title="Section 2">
      <Item Id="2.1">Alpha</Item>
      <Item Id="2.2">Charlie</Item>
      <Item Id="2.3">Delta</Item>
      <Item Id="2.4">Foxtrot</Item>
   </Section>
   <Section Title="Appendix A">
      <Item Id="A.1">Alpha</Item>
      <Item Id="A.2">Charlie</Item>
      <Item Id="A.3">Delta</Item>
      <Item Id="A.4">Foxtrot</Item>
   </Section>
   <Section Title="Appendix B"/>
</Document>

我不明白你是如何确定你想在输出中包括（或排除）哪些部分的，但这也应该很容易。

编辑：

我想要的章节是第1-2节和附录A；没有其他部分是相关的。

好吧，那就做吧：

<xsl:template match="/Document">
    <xsl:copy>
        <xsl:apply-templates select="Body/Heading[.='Section 1' or .='Section 2'or .='Appendix A']"/>
    </xsl:copy>
</xsl:template>

~~请注意，如果条目id在各部分之间不重复，那么这可能会更简单~~ 啊，但是我看到了。这就是项目1.1缺失的原因。

编辑2：

此节点集是两个节点集的交集：所有元素在标题"第1节"（包括标题本身）之后标题"附录B"之前的要素。

好吧，那么：

<xsl:template match="/Document">
    <xsl:copy>
        <xsl:apply-templates select="Body/Heading[.='Section 1' or preceding-sibling::Heading[.='Section 1'] and following-sibling::Heading[.='Appendix B']]"/>
    </xsl:copy>
</xsl:template>

或者，更短：

<xsl:template match="/Document">
    <xsl:copy>
        <xsl:apply-templates select="Body/Heading[not(following-sibling::Heading[.='Section 1']) and following-sibling::Heading[.='Appendix B']]"/>
    </xsl:copy>
</xsl:template>

编辑：

编辑2：

相关内容

最新更新

热门标签：