使用 XQuery 规范化 XML 的每个元素中的空间



我有这样的XML-

<a:price-range xmlns:c="http://iddn.icis.com/ns/core" xmlns:f="http://iddn.icis.com/ns/fields" xmlns:a="http://iddn.icis.com/ns/assets" xmlns:r="http://iddn.icis.com/ns/refdata">
    <c:id>
        http://iddn.icis.com/series-item/petchem/4021090-pricehistory-19990730000000</c:id>
    <c:type>series-item</c:type>
    <f:assessment-low>8.946586935</f:assessment-low>
    <f:assessment-high>9.946586935</f:assessment-high>
    <f:mid>9.44658693500000000000</f:mid>
    <f:period-label>
        <c:l10n xml:lang="en"/>
    </f:period-label>
</a:price-range>

我想规范化 XML 中的空间。就像上面的例子一样,c:id 元素中有空格。规范化空格后,上面的 XML 将如下所示 -

<a:price-range xmlns:c="http://iddn.icis.com/ns/core" xmlns:f="http://iddn.icis.com/ns/fields" xmlns:a="http://iddn.icis.com/ns/assets" xmlns:r="http://iddn.icis.com/ns/refdata">
    <c:id>http://iddn.icis.com/series-item/petchem/4021090-pricehistory-19990730000000</c:id>
    <c:type>series-item</c:type>
    <f:assessment-low>8.946586935</f:assessment-low>
    <f:assessment-high>9.946586935</f:assessment-high>
    <f:mid>9.44658693500000000000</f:mid>
    <f:period-label>
        <c:l10n xml:lang="en"/>
    </f:period-label>
</a:price-range>

我看了一下fn:normalise-space,但它仅适用于字符串。

我认为通过

应用序列化选项是不可能的,您必须通过应用转换模式的树。该页面中略微调整的示例,用于规范化空间并支持命名空间:

declare function local:copy($node as node()) as node() {
  typeswitch($node)
    case $text as text()
      return text { normalize-space($text) }
    case $element as element()
      return
        element { QName(namespace-uri($element), name($element)) } {
                  $element/@*,
                  for $child in $element/(* | text()) return local:copy($child)
                }
    default return $node
 };

local:copy(
  <a:price-range xmlns:c="http://iddn.icis.com/ns/core" xmlns:f="http://iddn.icis.com/ns/fields" xmlns:a="http://iddn.icis.com/ns/assets" xmlns:r="http://iddn.icis.com/ns/refdata">
    <c:id>
        http://iddn.icis.com/series-item/petchem/4021090-pricehistory-19990730000000</c:id>
    <c:type>series-item</c:type>
    <f:assessment-low>8.946586935</f:assessment-low>
    <f:assessment-high>9.946586935</f:assessment-high>
    <f:mid>9.44658693500000000000</f:mid>
    <f:period-label>
        <c:l10n xml:lang="en"/>
    </f:period-label>
  </a:price-range>
)

Marklogic还允许应用XSLT样式表,这可能是使用@Raj提出的<xsl:strip-space elements="*"/>这样做的更优雅的版本。

我想<xsl:strip-space elements="*"/>完美地工作,您需要首先通过xslt将xml转换为xml。

这个函数对我来说很好用——

(:
  The rules/assumptions are:
  #1 Retain one leading space if the node isn't first, has non-space content, and has leading space.
  #2 Retain one trailing space if the node isn't last, isn't first, and has trailing space. 
  #3 Retain one trailing space if the node isn't last, is first, has trailing space, and has non-space content.
  #4 Retain a single space if the node is an only child and only has space content.
  :)
  declare function local:normalize-space-in-xml($input)
  {
     element {node-name($input)}
       {$input/@*,
         for $child in $input/node()
         return
           if ($child instance of element())
           then local:normalize-space-in-xml($child)
           else
             if ($child instance of text())
             then
               (:#1 Retain one leading space if node isn't first, has non-space content, and has leading space:)
               if ($child/position() ne 1 and matches($child,'^s') and normalize-space($child) ne '')
               then (' ', normalize-space($child))
               else
                 (:#4 retain one space, if the node is an only child, and has content but it's all space:)
                 if ($child/last() eq 1 and string-length($child) ne 0 and normalize-space($child) eq '')
                 (: this overrules standard normalization:)
                 then ' '
                 else
                   (:#2 if the node isn't last, isn't first, and has trailing space, retain trailing space and collapse and trim the rest:)
                   if ($child/position() ne 1 and $child/position() ne last() and matches($child,'s$'))
                   then (normalize-space($child), ' ')
                   else
                     (:#3 if the node isn't last, is first, has trailing space, and has non-space content, then keep trailing space:)
                     if ($child/position() eq 1 and matches($child,'s$') and normalize-space($child) ne '')
                     then (normalize-space($child), ' ')
                     (:if the node is an only child, and has content which is not all space, then trim and collapse, that is, apply standard normalization:)
                     else normalize-space($child)
              else $child
      }
  };

最新更新