我需要解析250个文件，总共1gb的数据，并将其上传到SQL服务器.我能比这更有效率吗

我目前的方法需要大约40分钟来解析所有这些数据：

当前逻辑为：

foreach (var file in files)
{
    using (var input = new StreamReader(file.FullName))
    {
        while (!input.EndOfStream)
        {
            City parsedCity = ParseCity(input.ReadLine());
        }
        SQL.submit()
    }
}

You may assume the parsing is the quickest possible.

Try something like this. Experiment with the maxParallelism, start with the number of cores in your system:

class Program
{
    static void Main(string[] args)
    {
        var maxParallelism = Environment.ProcessorCount;
        Parallel.ForEach(files, new ParallelOptions { MaxDegreeOfParallelism = maxParallelism }, ParseAndPersist);
    }
    public static void ParseAndPersist(FileInfo fileInfo)
    {
        //Load entire file
        //Parse file
        //Execute SQL asynchronously..the goal being to achieve maximum file throughput aside from any SQL execution latency
    }
}

根据您所说的，每个文件大约有4MB，如果您必须在内存中的字符串缓冲区中导航，则不太大，无法将整个文件读取到内存中，并每行执行一次解析。您还可以利用并行任务并行处理多个文件，从而充分利用您的多核处理器。

您可以尝试并行解析文件，而不是按顺序解析。您也可以尝试在解析所有文件后只提交sql。

这些是否有什么不同很难说，因为你没有提供太多关于sql提交的信息，但我认为并行处理文件肯定是有益的。

很可能，您的瓶颈实际上是SQL查询/插入。您确定问题出在分析文件[s]上吗？如果是SQL，我建议先缓存现有内容，然后进行大容量数据复制。

相关内容

最新更新

热门标签：