我如何忽略在尝试重命名节点时正在创建无限循环的幻影xml属性



我的任务是将restful web服务的结果转换为具有新格式的XML文档。

要转换的html/xhtml示例:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    <head>
        <title>OvidWS Result Set Resource</title>
    </head>
    <body>
        <table id="results">
            <tr>
                <td class="_index">
                  <a class="uri" href="REDACTED">1</a>
                </td>
                <td class="au">
                  <span>GILLESPIE JB</span>
                  <span>KUKES RE</span>
                </td>
                <td class="so">A.M.A. American Journal of Diseases of Children</td>
                <td class="ti">Acetylsalicylic acid poisoning with recovery.</td>
                <td class="ui">20267726</td>
                <td class="yr">1947</td>
              </tr>
              <tr>
                <td class="_index">
                  <a class="uri" href="REDACTED">2</a>
                </td>
                <td class="au">BASS MH</td>
                <td class="so">Journal of the Mount Sinai Hospital, New York</td>
                <td class="ti">Aspirin poisoning in infants.</td>
                <td class="ui">20265054</td>
                <td class="yr">1947</td>
              </tr>
        </table>  
    </body>
</html>

理想情况下,我想做的就是将任何列为class属性并使其成为元素名称,在没有'class'属性的情况下,我只想将其标记为项。

这是我正在寻找的转换:

<results>
    <citation>
        <_index>
            <uri href="REDACTED">1</uri>
        </_index>
        <au>
            <item>GILLESPIE JB</item>
            <item>KUKES RE</item>
        </au>
        <so>A.M.A. American Journal of Diseases of Children</so>
        <ti>Acetylsalicylic acid poisoning with recovery.</ti>
        <ui>20267726</ui>
        <yr>1947</yr>
    </citation>
    <citation>
        <_index>
            <uri href="REDACTED">2</a>
        </_index>
        <au>BASS MH</au>
        <so>Journal of the Mount Sinai Hospital, New York</so>
        <ti>Aspirin poisoning in infants.</ti>
        <ui>20265054</ui>
        <yr>1947</yr>
    </citation>
</results>  

我在这里找到了一小段代码,它允许我重命名一个节点:

    Public Shared Function RenameNode(ByVal e As XmlNode, newName As String) As XmlNode
        Dim doc As XmlDocument = e.OwnerDocument
        Dim newNode As XmlNode = doc.CreateNode(e.NodeType, newName, Nothing)
        While (e.HasChildNodes)
            newNode.AppendChild(e.FirstChild)
        End While
        Dim ac As XmlAttributeCollection = e.Attributes
        While (ac.Count > 0) 
            newNode.Attributes.Append(ac(0))
        End While
        Dim parent As XmlNode = e.ParentNode
        parent.ReplaceChild(newNode, e)
        Return newNode
    End Function

但是在迭代XmlAttributeCollection时出现了一个问题。由于某种原因,在查看其中一个td节点时,源中没有出现的两个属性神奇地出现了:rowspan和colspan。似乎这些属性会干扰迭代器,因为当它们被使用时,它们不会像"class"属性那样从属性列表中消失。相反,使用属性的值(从"1"变为")。这将导致无限循环。

我注意到它们的类型是'XMLUnspecifiedAttribute',但是当我修改循环以检测它时:

While (ac.Count > 0) And Not TypeOf (ac(0)) Is System.Xml.XmlUnspecifiedAttribute
    newNode.Attributes.Append(ac(0))
End While

我得到以下错误:

System.Xml.XmlUnspecifiedAttribute is not accessible in this context because it is 'friend'

知道为什么会发生这种情况或如何解决它吗?

我认为你的问题确实是你的文档类型声明。

既然你把节点完全转换成其他东西,那么我想说你甚至不需要它,可以安全地忽略它。

因为我没有在我的测试中包含它,然后当我包含它时xmlresolver出现了混乱,我假设您在这里肯定不需要它。

您可以通过将解析器设置为nothing:

来忽略它。
{xml document object}.Xmlresolver = nothing

然后为节点和进程进行选择。我甚至在源文件中使用doc类型也这样做了,仍然没有问题。

下面是我用来测试的代码:
Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
    Dim USEDoc As New XmlDocument
    Dim theNameManager As System.Xml.XmlNamespaceManager = New System.Xml.XmlNamespaceManager(USEDoc.NameTable)
    theNameManager.AddNamespace("xhtml", "http://www.w3.org/1999/xhtml")
    USEDoc.XmlResolver = Nothing
    USEDoc.Load("RestServ.txt")
    renameNodes(USEDoc.SelectSingleNode("descendant::xhtml:table", theNameManager))
    Dim SaveDoc As New XmlDocument
    SaveDoc.AppendChild(SaveDoc.ImportNode(USEDoc.SelectSingleNode("//results", theNameManager), True))
    SaveDoc.Save("RestServConv.xml")
End Sub
Public Function renameNodes(ByVal TopNode As XmlNode) As Boolean
    Dim UseNode As XmlNode
    If TopNode.Name <> "#text" Then
        If TopNode.Name = "tr" Then
            UseNode = RenameNode(TopNode, "citation")
        ElseIf TopNode.Name = "table" Then
            UseNode = RenameNode(TopNode, "results")
            UseNode.Attributes.RemoveNamedItem("id")
        ElseIf TopNode.Attributes.Count > 0 Then
            For Each oAttribute As XmlAttribute In TopNode.Attributes
                If oAttribute.Name = "class" Then
                    UseNode = RenameNode(TopNode, oAttribute.Value)
                    UseNode.Attributes.RemoveNamedItem("class")
                    Exit For
                End If
            Next oAttribute
        End If
        If UseNode IsNot Nothing Then
            If UseNode.ChildNodes.Count > 0 Then
                Dim x As Integer
                For x = 0 To UseNode.ChildNodes.Count - 1
                    renameNodes(UseNode.ChildNodes(x))
                Next x
            End If
        End If
    End If
    Return True
End Function
Public Shared Function RenameNode(ByVal e As XmlNode, ByVal newName As String) As XmlNode
    Dim doc As XmlDocument = e.OwnerDocument
    Dim newNode As XmlNode = doc.CreateNode(e.NodeType, newName, Nothing)
    While (e.HasChildNodes)
        newNode.AppendChild(e.FirstChild)
    End While
    Dim ac As XmlAttributeCollection = e.Attributes
    While (ac.Count > 0)
        newNode.Attributes.Append(ac(0))
    End While
    Dim parent As XmlNode = e.ParentNode
    parent.ReplaceChild(newNode, e)
    Return newNode
End Function

我通过了您的示例文档,我得到的结果是:

<results>
  <citation>
    <_index>
      <uri href="REDACTED">1</uri>
    </_index>
    <au>
      <span xmlns="http://www.w3.org/1999/xhtml">GILLESPIE JB</span>
      <span xmlns="http://www.w3.org/1999/xhtml">KUKES RE</span>
    </au>
    <so rowspan="1" colspan="1">A.M.A. American Journal of Diseases of Children</so>
    <ti>Acetylsalicylic acid poisoning with recovery.</ti>
    <ui>20267726</ui>
    <yr>1947</yr>
  </citation>
  <citation>
    <_index>
      <uri href="REDACTED">2</uri>
    </_index>
    <au>BASS MH</au>
    <so>Journal of the Mount Sinai Hospital, New York</so>
    <ti>Aspirin poisoning in infants.</ti>
    <ui>20265054</ui>
    <yr>1947</yr>
  </citation>
</results>

最新更新