我的任务是将restful web服务的结果转换为具有新格式的XML文档。
要转换的html/xhtml示例:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>OvidWS Result Set Resource</title>
</head>
<body>
<table id="results">
<tr>
<td class="_index">
<a class="uri" href="REDACTED">1</a>
</td>
<td class="au">
<span>GILLESPIE JB</span>
<span>KUKES RE</span>
</td>
<td class="so">A.M.A. American Journal of Diseases of Children</td>
<td class="ti">Acetylsalicylic acid poisoning with recovery.</td>
<td class="ui">20267726</td>
<td class="yr">1947</td>
</tr>
<tr>
<td class="_index">
<a class="uri" href="REDACTED">2</a>
</td>
<td class="au">BASS MH</td>
<td class="so">Journal of the Mount Sinai Hospital, New York</td>
<td class="ti">Aspirin poisoning in infants.</td>
<td class="ui">20265054</td>
<td class="yr">1947</td>
</tr>
</table>
</body>
</html>
理想情况下,我想做的就是将任何列为class属性并使其成为元素名称,在没有'class'属性的情况下,我只想将其标记为项。
这是我正在寻找的转换:
<results>
<citation>
<_index>
<uri href="REDACTED">1</uri>
</_index>
<au>
<item>GILLESPIE JB</item>
<item>KUKES RE</item>
</au>
<so>A.M.A. American Journal of Diseases of Children</so>
<ti>Acetylsalicylic acid poisoning with recovery.</ti>
<ui>20267726</ui>
<yr>1947</yr>
</citation>
<citation>
<_index>
<uri href="REDACTED">2</a>
</_index>
<au>BASS MH</au>
<so>Journal of the Mount Sinai Hospital, New York</so>
<ti>Aspirin poisoning in infants.</ti>
<ui>20265054</ui>
<yr>1947</yr>
</citation>
</results>
我在这里找到了一小段代码,它允许我重命名一个节点:
Public Shared Function RenameNode(ByVal e As XmlNode, newName As String) As XmlNode
Dim doc As XmlDocument = e.OwnerDocument
Dim newNode As XmlNode = doc.CreateNode(e.NodeType, newName, Nothing)
While (e.HasChildNodes)
newNode.AppendChild(e.FirstChild)
End While
Dim ac As XmlAttributeCollection = e.Attributes
While (ac.Count > 0)
newNode.Attributes.Append(ac(0))
End While
Dim parent As XmlNode = e.ParentNode
parent.ReplaceChild(newNode, e)
Return newNode
End Function
但是在迭代XmlAttributeCollection时出现了一个问题。由于某种原因,在查看其中一个td节点时,源中没有出现的两个属性神奇地出现了:rowspan和colspan。似乎这些属性会干扰迭代器,因为当它们被使用时,它们不会像"class"属性那样从属性列表中消失。相反,使用属性的值(从"1"变为")。这将导致无限循环。
我注意到它们的类型是'XMLUnspecifiedAttribute',但是当我修改循环以检测它时:
While (ac.Count > 0) And Not TypeOf (ac(0)) Is System.Xml.XmlUnspecifiedAttribute
newNode.Attributes.Append(ac(0))
End While
我得到以下错误:
System.Xml.XmlUnspecifiedAttribute is not accessible in this context because it is 'friend'
知道为什么会发生这种情况或如何解决它吗?
我认为你的问题确实是你的文档类型声明。
既然你把节点完全转换成其他东西,那么我想说你甚至不需要它,可以安全地忽略它。
因为我没有在我的测试中包含它,然后当我包含它时xmlresolver出现了混乱,我假设您在这里肯定不需要它。
您可以通过将解析器设置为nothing
:
{xml document object}.Xmlresolver = nothing
然后为节点和进程进行选择。我甚至在源文件中使用doc类型也这样做了,仍然没有问题。
下面是我用来测试的代码:Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
Dim USEDoc As New XmlDocument
Dim theNameManager As System.Xml.XmlNamespaceManager = New System.Xml.XmlNamespaceManager(USEDoc.NameTable)
theNameManager.AddNamespace("xhtml", "http://www.w3.org/1999/xhtml")
USEDoc.XmlResolver = Nothing
USEDoc.Load("RestServ.txt")
renameNodes(USEDoc.SelectSingleNode("descendant::xhtml:table", theNameManager))
Dim SaveDoc As New XmlDocument
SaveDoc.AppendChild(SaveDoc.ImportNode(USEDoc.SelectSingleNode("//results", theNameManager), True))
SaveDoc.Save("RestServConv.xml")
End Sub
Public Function renameNodes(ByVal TopNode As XmlNode) As Boolean
Dim UseNode As XmlNode
If TopNode.Name <> "#text" Then
If TopNode.Name = "tr" Then
UseNode = RenameNode(TopNode, "citation")
ElseIf TopNode.Name = "table" Then
UseNode = RenameNode(TopNode, "results")
UseNode.Attributes.RemoveNamedItem("id")
ElseIf TopNode.Attributes.Count > 0 Then
For Each oAttribute As XmlAttribute In TopNode.Attributes
If oAttribute.Name = "class" Then
UseNode = RenameNode(TopNode, oAttribute.Value)
UseNode.Attributes.RemoveNamedItem("class")
Exit For
End If
Next oAttribute
End If
If UseNode IsNot Nothing Then
If UseNode.ChildNodes.Count > 0 Then
Dim x As Integer
For x = 0 To UseNode.ChildNodes.Count - 1
renameNodes(UseNode.ChildNodes(x))
Next x
End If
End If
End If
Return True
End Function
Public Shared Function RenameNode(ByVal e As XmlNode, ByVal newName As String) As XmlNode
Dim doc As XmlDocument = e.OwnerDocument
Dim newNode As XmlNode = doc.CreateNode(e.NodeType, newName, Nothing)
While (e.HasChildNodes)
newNode.AppendChild(e.FirstChild)
End While
Dim ac As XmlAttributeCollection = e.Attributes
While (ac.Count > 0)
newNode.Attributes.Append(ac(0))
End While
Dim parent As XmlNode = e.ParentNode
parent.ReplaceChild(newNode, e)
Return newNode
End Function
我通过了您的示例文档,我得到的结果是:
<results>
<citation>
<_index>
<uri href="REDACTED">1</uri>
</_index>
<au>
<span xmlns="http://www.w3.org/1999/xhtml">GILLESPIE JB</span>
<span xmlns="http://www.w3.org/1999/xhtml">KUKES RE</span>
</au>
<so rowspan="1" colspan="1">A.M.A. American Journal of Diseases of Children</so>
<ti>Acetylsalicylic acid poisoning with recovery.</ti>
<ui>20267726</ui>
<yr>1947</yr>
</citation>
<citation>
<_index>
<uri href="REDACTED">2</uri>
</_index>
<au>BASS MH</au>
<so>Journal of the Mount Sinai Hospital, New York</so>
<ti>Aspirin poisoning in infants.</ti>
<ui>20265054</ui>
<yr>1947</yr>
</citation>
</results>