如何解析大小为 1GB 的复杂和递归 xml 文件,并使用 xslt 将其存储在 csv 中



我有如下所示的示例 xml 数据,

<?xml version="1.0" encoding="ISO-8859-1"?>
 <FIXML s="2012-04-23" v="FIX.5.0SP2">
  <Batch ID="...">
   <MktDef MktID="XEUR" MktSegID="14" EfctvBizDt="2017-05-11" NxtEfctvBizDt="2017-05-15" MktSeg="CONF" MarketSegmentDesc="FUT 8-13 Y. SWISS GOV.BONDS 6%" Sym="CH0002741988" ParentMktSegmID="FBND" Ccy="CHF" MktSegStat="1" USFirmFlag="Y" PartID="2">
    <BaseTrdgRules QtSideInd="1" FastMktPctg="0">
    .
    .
    </BaseTrdgRules>
  </MktDef>
  <SecDef TxnTm="2016-12-09T07:29:08.483638853">
      <MktSegGrp MktSegID="14">
        <SecTrdgRules>
          <BaseTrdgRules ImpldMktInd="3" MlegModel="0"/>
        </SecTrdgRules>
     </MktSegGrp>
 </SecDef>
 <SecDef>
   <MktSegGrp MktSegID="14">
    <SecTrdgRules>
      <BaseTrdgRules ImpldMktInd="3" MlegModel="0"/>
    </SecTrdgRules>
  </MktSegGrp>
 </SecDef>
 <SecDef>
  <MktSegGrp MktSegID="14">
   <SecTrdgRules>
     <BaseTrdgRules ImpldMktInd="3" MlegModel="0"/>
   </SecTrdgRules>
 </MktSegGrp>
</SecDef>
<MktDef MktID="XEUR" MktSegID="19629" EfctvBizDt="2017-05-11" NxtEfctvBizDt="2017-05-15" MktSeg="FBON" MarketSegmentDesc="EURO BONO FUTURE 8,5-10,5 YEAR" Sym="DE000A163W29" ParentMktSegmID="FBND" Ccy="EUR" MktSegStat="1" USFirmFlag="Y" PartID="2">
     <BaseTrdgRules QtSideInd="1" FastMktPctg="0">
     </BaseTrdgRules>
</MktDef>
 .
 .
 .
 .
 .
</Batch>
</FIXML>

这是我的示例 XSLT...

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" >
  <xsl:output method="text"/>
  <xsl:template match="/">
    <xsl:text>MktID,MktSegID,TxnTm,PriSetPx,QtSideInd,FastMktPctg,ImpldMktInd,MlegModel</xsl:text>
    <xsl:text>&#xA;</xsl:text>
    <xsl:for-each select="FIXML/Batch">
      <xsl:variable name="mktDef" select="concat(/Batch/MktDef/@MktID,',',/Batch/MktDef/@MktSegID,',',/Batch/SecDef/@TxnTm,',',/Batch/SecDef/@PriSetPx)" />
      <xsl:choose>
        <xsl:when test="Batch">
          <xsl:for-each select="Batch">
            <xsl:value-of select="concat($mktDef, ',',/Batch/MktDef/BaseTrdgRules/@QtSideInd,',',/Batch/MktDef/BaseTrdgRules/@FastMktPctg,',',/Batch/SecDef/MktSegGrp/SecTrdgRules/BaseTrdgRules/@ImpldMktInd,',',/Batch/SecDef/MktSegGrp/SecTrdgRules/BaseTrdgRules/@MlegModel,'&#xA;')"/>    
          </xsl:for-each>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="concat($mktDef, ',,,,,&#xA;')"/>    
        </xsl:otherwise>
      </xsl:choose>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

我想在 MktDef 和 SecDef 中获取"BaseTrdgRules"的属性数据,如下所示,

MktID   MktSegID    TxnTm   PriSetPx    QtSideInd   FastMktPctg ImpldMktInd MlegModel
XEUR    14  158.39  2016-12-09T07:29:08.483638853               
XEUR    14  158.39  2016-12-09T07:29:08.483638853   3   0       
XEUR    14  158.39  2016-12-09T07:29:08.483638853   3   0   

我已经用 DOM 编写了代码,并且能够解析 xml。但是问题是内存问题,所以我必须使用可以解析大型XML文件的新解析器来开发它。

你能帮我吗?提前感谢!

我无法完全理解您的逻辑,但我认为您可能会受益于在此处使用键使用其MktSegGrp属性值查找SecDef元素

<xsl:key name="MktSeg" match="SecDef" use="MktSegGrp/@MktSegID" />

因此,对于给定的MktDef,您将获得它的SecDef元素,如下所示

<xsl:variable name="secDef" select="key('MktSeg', @MktSegID)" />

试试这个 XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" >
  <xsl:output method="text"/>
  <xsl:key name="MktSeg" match="SecDef" use="MktSegGrp/@MktSegID" />
  <xsl:template match="/">
    <xsl:text>MktID,MktSegID,TxnTm,PriSetPx,QtSideInd,FastMktPctg,ImpldMktInd,MlegModel</xsl:text>
    <xsl:text>&#xA;</xsl:text>
    <xsl:for-each select="FIXML/Batch/MktDef">
      <xsl:variable name="secDef" select="key('MktSeg', @MktSegID)" />
      <xsl:for-each select="BaseTrdgRules">
        <xsl:variable name="header" select="concat(../@MktID,',', ../@MktSegID, ',', @QtSideInd, ',', @FastMktPctg)" />
        <xsl:choose>
          <xsl:when test="$secDef">
            <xsl:for-each select="$secDef">
              <xsl:variable name="baseTrg" select="MktSegGrp/SecTrdgRules/BaseTrdgRules" />
              <xsl:value-of select="concat($header, ',', @TxnTm, ',', @PriSetPx, ',', $baseTrg/@ImpldMktInd, ',', $baseTrg/@MlegModel, '&#xA;')"/>
            </xsl:for-each>
          </xsl:when>
          <xsl:otherwise>
            <xsl:value-of select="concat($header, ',,,,&#xA;')"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:for-each>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>