在vb.net中读取非常大的文本文件时出现内存不足错误



我的任务是处理一个3.2GB固定宽度分隔的文本文件。每行长度为1563个字符,文本文件中大约有210万行。在读取了大约100万行之后,我的程序因内存不足异常错误而崩溃。

Imports System.IO
Imports Microsoft.VisualBasic.FileIO
Module TestFileCount
    ''' <summary>
    ''' Gets the total number of lines in a text file by reading a line at a time
    ''' </summary>
    ''' <remarks>Crashes when count reaches 1018890</remarks>
    Sub Main()
        Dim inputfile As String = "C:SplitBIGFILE.txt"
        Dim count As Int32 = 0
        Dim lineoftext As String = ""
        If File.Exists(inputfile) Then
            Dim _read As New StreamReader(inputfile)
            Try
                While (_read.Peek <> -1)
                    lineoftext = _read.ReadLine()
                    count += 1
                End While
                Console.WriteLine("Total Lines in " & inputfile & ": " & count)
            Catch ex As Exception
                Console.WriteLine(ex.Message)
            Finally
                _read.Close()
            End Try
        End If
    End Sub
End Module

这是一个非常简单的程序,可以一次读取一行文本文件,所以我认为它不应该占用缓冲区中太多的内存。

就我的一生而言,我不明白它为什么会崩溃。这里有人有什么想法吗?

我不知道这是否能解决你的问题,但不要使用peek,把你的循环改为:(这是C#,但你应该能把它翻译成VB)

while (_read.ReadLine() != null)
{
    count += 1
}

如果你需要在循环中使用文本行,而不是只计算行数,只需将代码修改为

while ((lineoftext = _read.ReadLine()) != null)
{
    count += 1
    //Do something with lineoftext
}

有点跑题,有点作弊,如果每行真的有1563个字符长(包括行尾),并且文件是纯ASCII(所以所有字符都占用一个字节),你可以只做(再次使用C#,但你应该能够翻译)

long bytesPerLine = 1563;
string inputfile = @"C:SplitBIGFILE.txt"; //The @ symbol is so we don't have to escape the ``
long length;
using(FileStream stream = File.Open(inputFile, FileMode.Open)) //This is the C# equivilant of the try/finally to close the stream when done.
{
    length = stream.Length;
}
Console.WriteLine("Total Lines in {0}: {1}", inputfile, (length / bytesPerLine ));

尝试使用ReadAsync,或者可以使用DiscardBufferedData(但速度较慢)

Dim inputfile As String = "C:Exampleexistingfile.txt" 
    Dim result() As String 
    Dim builder As StringBuilder = New StringBuilder()
    Try
        Using reader As StreamReader = File.OpenText(inputfile)
            ReDim result(reader.BaseStream.Length)
            Await reader.ReadAsync(result, 0, reader.BaseStream.Length)
        End Using 
        For Each str As String In result
            builder.Append(str)         
        Next
      Dim count as Integer=builder.Count()
       Console.WriteLine("Total Lines in " & inputfile & ": " & count)
    Catch ex As Exception
            Console.WriteLine(ex.Message)
    End Try

最新更新