使用 xslt-1.0 对具有相同属性的值进行分组



给定此输入 XML:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<agrisResources xmlns:ags="http://purl.org/agmes/1.1/" xmlns:dc="http://purl.org/dc/elements/1.1/">
<agrisResource bibliographicLevel="AM" ags:ARN="^aSF17^b00003">
<dc:subject xml:lang="en">Penaeidae</dc:subject>
<dc:subject xml:lang="en">Vibrio harveyi</dc:subject>
<dc:subject xml:lang="en">Vibrio parahaemolyticus</dc:subject>
<dc:subject>
<ags:subjectClassification scheme="ags:ASC">ASFA-1</ags:subjectClassification>
<ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Bacterial diseases</ags:subjectThesaurus>
<ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Fish diseases</ags:subjectThesaurus>
<ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Genes</ags:subjectThesaurus>
</dc:subject>
</agrisResource>
</agrisResources>

我想对具有相同属性的项目进行分组,因此输出如下所示:

<dc:subject xml:lang="en">Penaeidae||Vibrio harveyi||Vibrio parahaemolyticus</dc:subject>
<dc:subject>
<ags:subjectClassification scheme="ags:ASC">ASFA-1</ags:subjectClassification>
<ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Bacterial diseases||Fish diseases||Genes</ags:subjectThesaurus>
</dc:subject>

基本上,我的分组规则是,如果节点有多个值,则合并节点的值,例如dc:subjectags:subjectThesaurus。我在标题中指定对具有相同属性的值进行分组,因为我不确定是否可以仅按标签对它们进行分组,而不指定其属性来区分它们。

换句话说,区分

<dc:subject>Penaeidae</dc:subject>

<dc:subject>
<ags:subjectThesaurus>Bacterial diseases</ags:subjectThesaurus>
</dc:subject>

更新

输入 XML

<?xml version="1.0" encoding="ISO-8859-1" ?>
<agrisResources xmlns:ags="http://purl.org/agmes/1.1/" xmlns:dc="http://purl.org/dc/elements/1.1/">
<agrisResource bibliographicLevel="AM" ags:ARN="^aSF17^b00003">
<dc:creator>
<ags:creatorPersonal>Doe, John</ags:creatorPersonal>
<ags:creatorPersonal>Smith, Jason T.</ags:creatorPersonal>
<ags:creatorPersonal>Doe, Jane E.</ags:creatorPersonal>
</dc:creator>
<dc:subject xml:lang="en">Penaeidae</dc:subject>
<dc:subject xml:lang="en">Vibrio harveyi</dc:subject>
<dc:subject xml:lang="en">Vibrio parahaemolyticus</dc:subject>
<dc:subject>
<ags:subjectClassification scheme="ags:ASC">ASFA-1</ags:subjectClassification>
<ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Bacterial diseases</ags:subjectThesaurus>
<ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Fish diseases</ags:subjectThesaurus>
<ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Genes</ags:subjectThesaurus>
</dc:subject>
</agrisResource>
</agrisResources>

期望的输出

分组规则:使用双管道||作为重复元素的分隔符组合值,例如<ags:creatorPersonal><dc:subject xml:lang="en"><ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">。保留不符合该规则的其他元素。

<?xml version="1.0" encoding="ISO-8859-1" ?>
<agrisResources xmlns:ags="http://purl.org/agmes/1.1/" xmlns:dc="http://purl.org/dc/elements/1.1/">
<agrisResource bibliographicLevel="AM" ags:ARN="^aSF17^b00003">
<dc:creator>
<ags:creatorPersonal>Doe, John||Smith, Jason T.||Doe, Jane E.</ags:creatorPersonal>
</dc:creator>
<dc:subject xml:lang="en">Penaeidae||Vibrio harveyi||Vibrio parahaemolyticus</dc:subject>
<dc:subject>
<ags:subjectClassification scheme="ags:ASC">ASFA-1</ags:subjectClassification>
<ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Bacterial diseases||Fish diseases||Genes</ags:subjectThesaurus>
</dc:subject>
</agrisResource>
</agrisResources>

以下是我基于此答案的代码:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:dc="http://purl.org/dc/terms/"
xmlns:ags="http://purl.org/agmes/1.1/"
xmlns:agls="http://www.naa.gov.au/recordkeeping/gov_online/agls/1.2"
xmlns:dcterms="http://purl.org/dc/terms/">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="ags:subjectThesaurus|dc:subject">
<xsl:copy>
<xsl:apply-templates select="@* | text()"/>
<xsl:call-template name="NextSibling"/>
</xsl:copy>
</xsl:template>
<xsl:template match="ags:subjectThesaurus[@scheme = preceding-sibling::*[1][self::ags:subjectThesaurus]/@scheme]|dc:subject[@xml:lang = preceding-sibling::*[1][self::dc:subject]/@xml:lang]"/>
<xsl:template match="ags:subjectThesaurus|dc:subject" mode="includeSib">
<xsl:value-of select="concat('||', .)"/>
<xsl:call-template name="NextSibling"/>
</xsl:template>
<xsl:template name="NextSibling">
<xsl:apply-templates select="following-sibling::*[1][self::ags:subjectThesaurus and @scheme = current()/@scheme]|following-sibling::*[1][self::dc:subject and @xml:lang = current()/@xml:lang]" mode="includeSib"/>
</xsl:template>
</xsl:stylesheet>

我唯一的问题是它只是转换ags:subjectThesaurus而不是dc:subject节点。我的输出如下所示:

<dc:subject xml:lang="en">Penaeidae</dc:subject>
<dc:subject xml:lang="en">Vibrio harveyi</dc:subject>
<dc:subject xml:lang="en">Vibrio parahaemolyticus</dc:subject>
<dc:subject>
<ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Bacterial diseases||Fish diseases||Genes</ags:subjectThesaurus>
</dc:subject>

如何修改我的代码,使其也对具有相同xml:lang属性的dc:subject节点进行分组?

编辑

根据 michael.hor257k 的建议以及从这个答案中使用 Muenchian 方法,以下是我尝试的:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:dc="http://purl.org/dc/terms/"
xmlns:ags="http://purl.org/agmes/1.1/"
xmlns:agls="http://www.naa.gov.au/recordkeeping/gov_online/agls/1.2"
xmlns:dcterms="http://purl.org/dc/terms/">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="kNodeSubject" match="dc:subject[@xml:lang]" use="@xml:lang"/>
<xsl:key name="subjectThesaurus" match="dc:subject/ags:subjectThesaurus" use="@scheme"/>
<xsl:template match="node() | @*">
<xsl:copy>
<xsl:apply-templates select="node() | @*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="dc:subject[generate-id() = generate-id(key('kNodeSubject', @xml:lang)[1])]">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="key('kNodeSubject', @xml:lang)" mode="concat"/>
</xsl:copy>
</xsl:template>
<xsl:template match="dc:subject/ags:subjectThesaurus[generate-id() = generate-id(key('subjectThesaurus', @scheme)[1])]">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="key('subjectThesaurus', @scheme)" mode="concat"/>
</xsl:copy>
</xsl:template>
<xsl:template match="dc:subject|subjectThesaurus" mode="concat">
<xsl:value-of select="."/>
<xsl:if test="position() != last()">
<xsl:text>||</xsl:text>
</xsl:if>
</xsl:template>
<xsl:template match="dc:subject"/>
<xsl:template match="ags:subjectThesaurus"/>
</xsl:stylesheet>

当我应用上面的代码时,节点ags:subjectThesaurus消失了,<dc:subject xml:lang="en">的值也没有分组。我不知道我的匹配是否正确,我使用了match="dc:subject[@xml:lang]"进行<xsl:key name="kNodeSubject",因为节点ags:subjectThesaurus<dc:subject>的子节点。

提前谢谢。

考虑以下示例:

.XML

<root xmlns:dc="http://purl.org/dc/terms/" xmlns:ags="http://purl.org/agmes/1.1/">
<dc:subject xml:lang="en">Penaeidae</dc:subject>
<dc:subject xml:lang="en">Vibrio harveyi</dc:subject>
<dc:subject xml:lang="fr">Franca premier</dc:subject>
<dc:subject xml:lang="fr">Franca deux</dc:subject>
<dc:subject xml:lang="en">Vibrio parahaemolyticus</dc:subject>
<dc:subject>
<ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Bacterial diseases</ags:subjectThesaurus>
<ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Fish diseases</ags:subjectThesaurus>
<ags:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Genes</ags:subjectThesaurus>
<ags:subjectThesaurus xml:lang="en" scheme="ags:B">Bees</ags:subjectThesaurus>
<ags:subjectThesaurus xml:lang="en" scheme="ags:B">Birds</ags:subjectThesaurus>
</dc:subject>
</root>

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:dc="http://purl.org/dc/terms/"
xmlns:ags="http://purl.org/agmes/1.1/">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="subj-by-lang" match="dc:subject[@xml:lang]" use="@xml:lang"/>
<xsl:key name="thes-by-scheme" match="ags:subjectThesaurus" use="@scheme"/>
<!-- identity transform -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="root">
<xsl:copy>
<!-- group subjects by lang -->
<xsl:for-each select="dc:subject[@xml:lang][count(. | key('subj-by-lang', @xml:lang)[1]) = 1]">
<dc:subject xml:lang="{@xml:lang}">
<xsl:for-each select="key('subj-by-lang', @xml:lang)">
<xsl:value-of select="."/>
<xsl:if test="position() != last()">
<xsl:text>||</xsl:text>
</xsl:if>
</xsl:for-each>
</dc:subject>  
</xsl:for-each>
<!-- process other nodes -->
<xsl:apply-templates select="node()[not(self::dc:subject[@xml:lang])]"/>
</xsl:copy>
</xsl:template>
<xsl:template match="dc:subject">
<xsl:copy>
<!-- group thesauri by scheme    -->
<xsl:for-each select="ags:subjectThesaurus[count(. | key('thes-by-scheme', @scheme)[1]) = 1]">
<dc:subjectThesaurus xml:lang="{@xml:lang}" scheme="{@scheme}">
<xsl:for-each select="key('thes-by-scheme', @scheme)">
<xsl:value-of select="."/>
<xsl:if test="position() != last()">
<xsl:text>||</xsl:text>
</xsl:if>
</xsl:for-each>
</dc:subjectThesaurus> 
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

结果

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:dc="http://purl.org/dc/terms/" xmlns:ags="http://purl.org/agmes/1.1/">
<dc:subject xml:lang="en">Penaeidae||Vibrio harveyi||Vibrio parahaemolyticus</dc:subject>
<dc:subject xml:lang="fr">Franca premier||Franca deux</dc:subject>
<dc:subject>
<dc:subjectThesaurus xml:lang="en" scheme="ags:ASFAT">Bacterial diseases||Fish diseases||Genes</dc:subjectThesaurus>
<dc:subjectThesaurus xml:lang="en" scheme="ags:B">Bees||Birds</dc:subjectThesaurus>
</dc:subject>
</root>

添加:

根据您的澄清,我怀疑您想做一些更简单的事情:只需将一些叶节点(即没有子元素的节点(连接在一起,并保留其他节点。

下面是在agrisResource中连接dc:subject叶节点的示例:

<xsl:template match="agrisResource">
<xsl:copy>
<!-- join subjects with no children -->
<dc:subject>
<!-- copy the attributes of the first subject with no children -->
<xsl:copy-of select="dc:subject[not(*)][1]/@*"/>
<!-- concat the values of all subjects with any attributes -->
<xsl:for-each select="dc:subject[not(*)]">
<xsl:value-of select="."/>
<xsl:if test="position() != last()">
<xsl:text>||</xsl:text>
</xsl:if>
</xsl:for-each>
</dc:subject>  
<!-- process other nodes -->
<xsl:apply-templates select="node()[not(self::dc:subject[not(*)])]"/>
</xsl:copy>
</xsl:template>

这可以通过使用基于元素名称的键来概括。

相关内容

  • 没有找到相关文章

最新更新