Json.Net中的Streams和BsonWriter出现OutOfMemory异常



我在使用Json.net和创建一个大型Bson文件时遇到问题。我有以下测试代码:

Imports System.IO
Imports Newtonsoft.Json
Public Class Region
    Public Property Id As Integer
    Public Property Name As String
    Public Property FDS_Id As String
End Class
Public Class Regions
    Inherits List(Of Region)
    Public Sub New(capacity As Integer)
        MyBase.New(capacity)
    End Sub
End Class
Module Module1
    Sub Main()
        Dim writeElapsed2 = CreateFileBson_Stream(GetRegionList(5000000))
        GC.Collect(0)
    End Sub
    Public Function GetRegionList(count As Integer) As List(Of Region)
        Dim regions As New Regions(count - 1)
        For lp = 0 To count - 1
            regions.Add(New Region With {.Id = lp, .Name = lp.ToString, .FDS_Id = lp.ToString})
        Next
        Return regions
    End Function
    Public Function CreateFileBson_Stream(regions As Regions) As Long
        Dim sw As New Stopwatch
        sw.Start()
        Dim lp = 0
        Using stream = New StreamWriter("c:atlasregionsStream.bson")
            Using writer = New Bson.BsonWriter(stream.BaseStream)
                writer.WriteStartArray()
                For Each item In regions
                    writer.WriteStartObject()
                    writer.WritePropertyName("Id")
                    writer.WriteValue(item.Id)
                    writer.WritePropertyName("Name")
                    writer.WriteValue(item.Name)
                    writer.WritePropertyName("FDS_Id")
                    writer.WriteValue(item.FDS_Id)
                    writer.WriteEndObject()
                    lp += 1
                    If lp Mod 1000000 = 0 Then
                        writer.Flush()
                        stream.Flush()
                        stream.BaseStream.Flush()
                    End If
                Next
                writer.WriteEndArray()
            End Using
        End Using
        sw.Stop()
        Return sw.ElapsedMilliseconds
    End Function
End Module

在第一个using语句中,我使用了FileStream而不是StreamWriter,这没有什么区别。

CreateBsonFile_Stream在超过300万条记录时失败,出现OutOfMemory异常。在visualstudio中使用内存分析器可以显示内存在继续攀升,即使我正在刷新我能刷新的所有内容。

500万个区域的列表在内存中约为468Mb。

有趣的是,如果我使用以下代码来生成Json,它可以工作,并且内存稳定在500Mb:

Public Function CreateFileJson_Stream(regions As Regions) As Long
        Dim sw As New Stopwatch
        sw.Start()
        Using stream = New StreamWriter("c:atlasregionsStream.json")
            Using writer = New JsonTextWriter(stream)
                writer.WriteStartArray()
                For Each item In regions
                    writer.WriteStartObject()
                    writer.WritePropertyName("Id")
                    writer.WriteValue(item.Id)
                    writer.WritePropertyName("Name")
                    writer.WriteValue(item.Name)
                    writer.WritePropertyName("FDS_Id")
                    writer.WriteValue(item.FDS_Id)
                    writer.WriteEndObject()
                Next
                writer.WriteEndArray()
            End Using
        End Using
        sw.Stop()
        Return sw.ElapsedMilliseconds
    End Function

我很确定这是BsonWriter的问题,但我看不出我还能做什么。有什么想法吗?

内存不足的原因如下。根据BSON规范,每个对象或数组(在标准中称为文档)必须在开头包含组成文档的字节总数的计数:

document    ::=     int32 e_list "x00"     BSON Document. int32 is the total number of bytes comprising the document.
e_list      ::=     element e_list  
    |   ""  
element     ::=     "x01" e_name double    64-bit binary floating point
    |   "x02" e_name string    UTF-8 string
    |   "x03" e_name document  Embedded document
    |   "x04" e_name document  Array
    |   ...

因此,在写入根对象或数组时,必须预先计算要写入文件的总字节数

Newtonsoft的BsonDataWriter和底层BsonBinaryWriter通过缓存要写入树中的所有令牌来实现这一点,然后当根令牌的内容最终确定时,在写出树之前递归计算大小。(替代方案是让应用程序(即您的代码)以某种方式预先计算这些信息——实际上是不可能的——或者在输出流中来回寻找来写入这些信息,可能只针对Stream.CanSeek == true的流。)您得到OutOfMemory异常,因为您的系统没有足够的资源来容纳令牌树。

相比之下,JSON标准不要求在文件中的任何位置写入字节计数或大小。因此,JsonTextWriter可以立即流式传输序列化的数组内容,而无需缓存任何内容。

作为一种解决方法,基于BSON规范和BsonBinaryWriter,我创建了一个助手方法,该方法将枚举对象增量序列化为Stream.CanSeek == true所在的流。它不需要将整个BSON文档缓存在内存中,而是寻求流的开头来写入最后的字节计数:

public static partial class BsonExtensions
{
    const int BufferSize = 256;
    public static void SerializeEnumerable<TItem>(IEnumerable<TItem> enumerable, Stream stream, JsonSerializerSettings settings = null)
    {
        // Created based on https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/Bson/BsonBinaryWriter.cs
        // And http://bsonspec.org/spec.html
        if (enumerable == null || stream == null)
            throw new ArgumentNullException();
        if (!stream.CanSeek || !stream.CanWrite)
            throw new ArgumentException("!stream.CanSeek || !stream.CanWrite");
        var serializer = JsonSerializer.CreateDefault(settings);
        var contract = serializer.ContractResolver.ResolveContract(typeof(TItem));
        BsonType rootType;
        if (contract is JsonObjectContract || contract is JsonDictionaryContract)
            rootType = BsonType.Object;
        else if (contract is JsonArrayContract)
            rootType = BsonType.Array;
        else
            // Arrays of primitives are not implemented yet.
            throw new JsonSerializationException(string.Format("Item type "{0}" not implemented.", typeof(TItem)));
        stream.Flush(); // Just in case.
        var initialPosition = stream.Position;
        var buffer = new byte[BufferSize];
        WriteInt(stream, (int)0, buffer); // CALCULATED SIZE TO BE CALCULATED LATER.
        ulong index = 0;
        foreach (var item in enumerable)
        {
            if (item == null)
            {
                stream.WriteByte(unchecked((byte)BsonType.Null));
                WriteString(stream, index.ToString(NumberFormatInfo.InvariantInfo), buffer);
            }
            else
            {
                stream.WriteByte(unchecked((byte)rootType));
                WriteString(stream, index.ToString(NumberFormatInfo.InvariantInfo), buffer);
                using (var bsonWriter = new BsonDataWriter(stream) { CloseOutput = false })
                {
                    serializer.Serialize(bsonWriter, item);
                }
            }
            index++;
        }
        stream.WriteByte((byte)0);
        stream.Flush();
        var finalPosition = stream.Position;
        stream.Position = initialPosition;
        var size = checked((int)(finalPosition - initialPosition));
        WriteInt(stream, size, buffer); // CALCULATED SIZE.
        stream.Position = finalPosition;
    }
    private static readonly Encoding Encoding = new UTF8Encoding(false);
    private static void WriteString(Stream stream, string s, byte[] buffer)
    {
        if (s != null)
        {
            if (s.Length < buffer.Length / Encoding.GetMaxByteCount(1))
            {
                var byteCount = Encoding.GetBytes(s, 0, s.Length, buffer, 0);
                stream.Write(buffer, 0, byteCount);
            }
            else
            {
                byte[] bytes = Encoding.GetBytes(s);
                stream.Write(bytes, 0, bytes.Length);
            }
        }
        stream.WriteByte((byte)0);
    }
    private static void WriteInt(Stream stream, int value, byte[] buffer)
    {
        unchecked
        {
            buffer[0] = (byte)value;
            buffer[1] = (byte)(value >> 8);
            buffer[2] = (byte)(value >> 16);
            buffer[3] = (byte)(value >> 24);
        }
        stream.Write(buffer, 0, 4);
    }
    enum BsonType : sbyte
    {
        // Taken from https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/Bson/BsonType.cs
        // And also http://bsonspec.org/spec.html
        Number = 1,
        String = 2,
        Object = 3,
        Array = 4,
        Binary = 5,
        Undefined = 6,
        Oid = 7,
        Boolean = 8,
        Date = 9,
        Null = 10,
        Regex = 11,
        Reference = 12,
        Code = 13,
        Symbol = 14,
        CodeWScope = 15,
        Integer = 16,
        TimeStamp = 17,
        Long = 18,
        MinKey = -1,
        MaxKey = 127
    }
}

然后将其称为:

BsonExtensions.SerializeEnumerable(regions, stream)

注:

  • 您可以使用上面的方法序列化为本地FileStreamMemoryStream,但不能序列化为无法重新定位的DeflateStream

  • 未实现序列化基元的可枚举对象,但可以实现。

  • 在版本10.0.1中,Newtonsoft将BSON处理转移到一个单独的nugetNewtonsoft.Json.BSON中,并用BsonDataWriter替换了BsonWriter。如果您使用的是早期版本的Newtonsoft,则上述答案同样适用于旧的BsonWriter

  • 由于Json.NET是用c#编写的,而我的主要语言是c#,因此解决方法也是用c#编写。如果你需要将此转换为VB.NET,请告诉我,我可以尝试。

在这里演示一些简单的单元测试。

找到了-BsonWriter正在努力变得"智能"。。。因为我把json作为一个区域数组来生成,所以不管你做什么刷新,它似乎都会把整个数组保存在内存中

为了证明这一点,我取出了Start和End Array写入并运行了例程——内存使用率保持在500Mb,并且过程运行正常。

我的猜测是,这是JsonWriter中修复的一个错误,但在使用较少的BsonWriter 中没有修复。

相关内容

  • 没有找到相关文章

最新更新