我需要使用XmlTextReader将一个大的XML文件拆分为多个输出XML

我需要获取一个XML文件，并从输入文件的数千个重复节点创建多个输出XML文件。源文件"AnimalBatch.xml"如下所示：

<?xml version="1.0" encoding="utf-8" ?>
<Animals>
<Animal id="1001">
<Quantity>One</Quantity>
<Adjective>Red</Adjective>
<Name>Rooster</Name>
</Animal>
<Animal id="1002">
<Quantity>Two</Quantity>
<Adjective>Stubborn</Adjective>
<Name>Donkeys</Name>
</Animal>
<Animal id="1003">
<Quantity>Three</Quantity>
<Color>Blind</Color>
<Name>Mice</Name>
</Animal>
</Animals>

但实际上，里面没有CR/LF字符。实际的文本流看起来是这样的：

<?xml version="1.0" encoding="utf-8" ?><Animals><Animal id="1001"><Quantity>One</Quantity><Adjective>Red</Adjective><Name>Rooster</Name></Animal><Animal id="1002"><Quantity>Two</Quantity><Adjective>Stubborn</Adjective><Name>Donkeys</Name></Animal><Animal id="1003"><Quantity>Three</Quantity><Color>Blind</Color><Name>Mice</Name></Animal></Animals>

程序需要拆分重复的"Animal"并生成3个文件，分别命名为：Animal_1001.xml、Animal_1002.xml和Animal_1003.xml

我之前使用XmlDocument对此有一个问题，这个问题已经得到了回答
请参阅：[使用XmlDocument将XML文件拆分为多个XML][1]

这个问题是关于如何使用XmlReader获取元素并从中创建XmlDocument元素。

Animal_1001.xml:
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>One</Quantity>
<Adjective>Red</Adjective>
<Name>Rooster</Name>
</Animal>

Animal_1002.xml
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>Two</Quantity>
<Adjective>Stubborn</Adjective>
<Name>Donkeys</Name>
</Animal>

Animal_103.xml>
<?xml version="1.0" encoding="utf-8"?>
<Animal>
<Quantity>Three</Quantity>
<Adjective>Blind</Adjective>
<Name>Mice</Name>
</Animal>

以下是有效的代码-但只有当输入文件中有换行符时：

    static void SplitXMLReader() 
    {
        string strFileName;
        string strSeq;
        XmlReader doc = XmlReader.Create("C:\AnimalBatch.xml");
        while (doc.Read())
        {
            if (doc.Name=="Animal")
            {
                strSeq = doc.GetAttribute("id");
                XmlDocument outdoc = new XmlDocument();
                XmlDeclaration xmlDeclaration = outdoc.CreateXmlDeclaration("1.0", "utf-8", null);
                XmlElement rootNode = outdoc.CreateElement(doc.Name);
                rootNode.InnerXml = doc.ReadInnerXml();
                outdoc.InsertBefore(xmlDeclaration, outdoc.DocumentElement);
                outdoc.AppendChild(rootNode);
                strFileName = "Animal_" + strSeq + ".xml";
                outdoc.Save("C:\" + strFileName);
            }
        }
    }

当该程序在"AnimalBatch.xml"的副本上运行时，该副本在每个元素之后都有回车符，它可以工作，并根据需要创建Animal_xxxxx.xml文件。当AnimalBatch.xml看起来像未格式化的文本流时，它会获得第一个Animal，并可以获得其ID 1001并写入输出文件。它可以读取后续的Animal元素，但不能获得"ID"属性，并最终写入名为"Animal.xml"的输出文件，因为它试图从该属性读取的strSeq变量显然为null或空白。最后，第二个文件只包含以下内容：

<?xml version="1.0" encoding="utf-8"?>
<Animal />

这让我相信XmlReader，至少在文档的范围内是这样。Read（）方法，（doc.Name=="Animal"）语句或更高版本的"strSeq=doc.GetAttribute（"id"）；"-如果<Animal id="1002">标签后有CR/LF，则工作方式不同。

我想我真正的问题是——当它发生时，医生。GetAttribute（"id"）；文档中的光标在哪里？为什么它不能得到"1001"之后的那些——这是有效的？

John说XML不关心格式——我也一直这么认为——但这让人感到困惑。此外，对于我的应用程序，我获取XML的唯一方法是取消格式化，因为我是通过SSIS从SQL中提取的，它是一个文本流，而不是XML对象。

首先，我看不到您在任何地方为outdoc分配任何内容。。。我想您想用当前节点数据填充它，然后保存它？此外，我会创建一个XmlDocument对象，然后在循环中清除/填充它，在循环中创建几千次新对象并不是一个好主意。。。

还要注意，XmlReader一次移动一个元素。所以你的代码atm会：

调用XmlRead()，不要陷入任何情况（它会读取第一个?xml声明）
调用XmlRead()一次，陷入这种情况，移动到id属性并写入空文件
调用XmlRead()10次\，跳过所有内容直到下一个Animal元素

从<Animal>标记内部获取数据的一种解决方案类似于msdn上的此示例。

第二个是考虑更方便的方法，比如使用ReadToFollowing的ReadInnerXml方法。另外，请查看GetAttribute方法。

我的程序是：

string toFile = "";
读取文件直到<Animal>标记
GetAttribute("id");
toFile = ReadInnerXml();
将toFile写入文件；）
doc.ReadToFollowing("Animal");

可能会有一些小的调整，因为我没有检查我用编译器写的东西。。。

您需要在outdic上创建根节点。使用此代码：

    static void SplitXMLTextReader()
    {
        string strFileName;
        string strSeq = "0";
        XmlTextReader doc = new XmlTextReader(("C:\AnimalBatch.xml"));
        doc.WhitespaceHandling = WhitespaceHandling.None;
        while (doc.Read())
        {
            switch (doc.Name)
            {
                case "Animal":
                    XmlDocument outdoc = new XmlDocument();
                   XmlDeclaration xmlDeclaration = outdoc.CreateXmlDeclaration("1.0", "utf-8", null);
                       XmlElement rootNode = outdoc.CreateElement(doc.Name);
                    rootNode.InnerXml = doc.ReadInnerXml();
                    outdoc.InsertBefore(xmlDeclaration, outdoc.DocumentElement);
                    outdoc.AppendChild(rootNode);

                    doc.MoveToFirstAttribute();
                    if (string.Compare(doc.Name, "id", true) == 0)
                    {
                        strSeq = doc.Value;
                    }
                    strFileName = "Animal_" + strSeq + ".xml";
                    outdoc.Save("C:\" + strFileName);
                    break;
            }
        }
    }

static void SplitXMLReader()
{
    string strFileName;
    string strSeq;
    XmlReader doc = XmlReader.Create("C:\AnimalBatch.xml");
    while (doc.Read())
    {
        if (doc.Name=="Animal")
        {
            strSeq = doc.GetAttribute("id");
            XmlDocument outdoc = new XmlDocument();
            XmlDeclaration xmlDeclaration = outdoc.CreateXmlDeclaration("1.0", "utf-8", null);
            XmlElement rootNode = outdoc.CreateElement(doc.Name);
            rootNode.InnerXml = doc.ReadInnerXml();
            outdoc.InsertBefore(xmlDeclaration, outdoc.DocumentElement);
            outdoc.AppendChild(rootNode);
            strFileName = "Animal_" + strSeq + ".xml";
            outdoc.Save("C:\" + strFileName);
        }
    }
}

相关内容

最新更新

热门标签：