对象没有按照预期序列化为XML (UTF-8)



我有一个序列化对象的助手方法,直到您尝试更改编码…当被消费者web服务接收时,出现一些不正确的奇怪字符。

这是来自应用程序的日志条目,

UTF-16 (this works):

2011-08-09 11:16:03,140 DEBUG SomeRestfulService *   xmlData    <?xml version="1.0" encoding="utf-8"?>
<loginRequest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <UserName>Admin</UserName>
  <Password>Password</Password>
  <MarketCode>GB</MarketCode>
</loginRequest>

UTF-8(注意这个奇怪的字符):

2011-08-09 11:21:30,687 DEBUG SomeRestfulService *   xmlData    <?xml version="1.0" encoding="utf-8"?><loginRequest xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><UserName>Admin</UserName><Password>Password</Password><MarketCode>GB</MarketCode></loginRequest>

我不知道为什么它失去了布局。

辅助方法:

Public Shared Function SerializeObject(ByVal obj As Object, ByVal encoding As Text.Encoding) As String
    Dim serializer As New XmlSerializer(obj.GetType)
    If encoding Is Nothing Then
        Using strWriter As New IO.StringWriter()
            serializer.Serialize(strWriter, obj)
            Return strWriter.ToString
        End Using
    Else
        Using stream As New IO.MemoryStream, xtWriter As New Xml.XmlTextWriter(stream, encoding)
            serializer.Serialize(xtWriter, obj)
            Return encoding.GetString(stream.ToArray())
        End Using
    End If

End Function

注意:如果我传递编码为空,默认编码是UTF-16,一切都好,最初我从来没有编码部分,但它是一个要求,所以需要在那里。

当编码为UTF-8时,我是否在做不正确的序列化?我该如何解决这个问题?

我试着省略BOM,但仍然有同样的问题:

Dim utf8 As New Text.UTF8Encoding(True)
Using stream As New IO.MemoryStream, xtWriter As New Xml.XmlTextWriter(stream, utf8)
    serializer.Serialize(xtWriter, obj)
    Return utf8.GetString(stream.ToArray())
End Using

您所看到的是字节顺序标记(BOM),通常用于文本文件或流的开头,以指示字节顺序和Unicode变体。

你的序列化器很奇怪。如果使用某种编码(如UTF-8)对字符串进行编码,则必须将其作为字节数组返回。首先用UTF-8编码XML,然后再将UTF-8流解码回字符串,您将一无所获(除了引入有问题的BOM)。

要么只使用UTF-16,要么返回一个字节数组。就像现在的函数一样,编码只会带来问题。

更新:

根据下面注释中的代码,我将看到两种方法:

方法1:用序列化后的数据创建一个字符串,并将其转换为UTF-8格式

Public Shared Function SerializeObject(ByVal obj As Object) As String
    Dim serializer As New XmlSerializer(obj.GetType)
    Using strWriter As New IO.StringWriter()
        serializer.Serialize(strWriter, obj)
        Return strWriter.ToString
    End Using
End Function
....
Dim serialisedObject As String = SerializeObject(object)
Dim postData As Byte() = New Text.UTF8Encoding(True).GetBytes(serialisedObject)

如果需要不同的编码,请更改最后一行。如果您想省略字节顺序标记,将False传递给UTF8Encoding()

方法2:首先创建正确编码的数据,然后继续使用字节数组

Public Shared Function SerializeObject(ByVal obj As Object, ByVal encoding As Text.Encoding) As Byte()
    Dim serializer As New XmlSerializer(obj.GetType)
    If encoding Is Nothing Then
       Set encoding = Encoding.Unicode
    End If
    Using stream As New IO.MemoryStream, xtWriter As New Xml.XmlTextWriter(stream, encoding)
        serializer.Serialize(xtWriter, obj)
        Return stream.ToArray()
    End Using
End Function

....
Dim postData As Byte() = SerializeObject(object)

此时,XmlTextWriter直接用正确的编码方式对数据进行编码。因为我们已经有了字节数组,所以最后一步更短:我们直接将数据发送到客户端。