为了提供一些上下文,我正在尝试优化以下代码它逐行读取文件,缓冲这些行并每 100 行保存到数据库中 -
using (StreamReader sr = new StreamReader(fileName, Encoding.Default))
{
IList<string> list = new List<string>();
int lineCount = 0;
foreach (var line in sr.ReadLines((char)someEOL)) //ReadLines is an extension method that yield returns lines based on someEOL while reading character by character
{
list.Add(line); //Keeping it simple for this example. In the actual code it goes through a bunch of operations
if(++lineCount % 100 == 0) { //Will not work if the total number of lines is not a multiple of 100
SaveToDB(list);
list = new List<string>();
}
}
if(list.Count() > 0)
SaveToDB(list); //I would like to get rid of this. This is for the case when total number of lines is not a multiple of 100.
}
正如您会注意到的,SaveToDB(list)
在上面的代码中发生了两次。在total number of lines % 100 != 0
的情况下第二次需要它(例如,如果有 101 行,则if(lineCount % 100 == 0)
将错过最后一行(。这不是一个很大的麻烦,但我想知道我是否可以摆脱它。
为此,如果我能在进入 foreach 循环之前读取总行数,我就可以以不同的方式编写if(lineCount % 100 == 0)
。但是查找行总数需要逐个字符遍历文件以计数someEOL
这是一个明确的否定,因为文件大小的范围为 5-20 GB。有没有办法在不降低性能的情况下进行计数(这对我来说似乎值得怀疑,但也许有解决方案(?或者另一种重写它以摆脱额外SaveDB(list)
调用的方法?
您的代码看起来不错,除了每次读取 100 行时创建新的空列表。无论如何,您可能想尝试这种方法:
var enumerator = sr.ReadLines((char)someEOL).GetEnumerator();
isValid = true;
for (int i = 1; isValid; i++)
{
bool isValid = enumerator.MoveNext();
if (isValid)
{
list.Add(enumerator.Current);
}
if (i % 100 == 0 || (!isValid && list.Count() > 0))
{
SaveToDB(list);
// It is better to clear the list than creating new one for each iteration, given that your file is big.
list.Clear();
}
}
我想你正在寻找StreamReader.Peek((
sr.Peek().Equals(-1)
法典:
string filepath = "myfile.txt";
int lineCount = 0;
List<string> list = new List<string>();
using (StreamReader sr = File.OpenText(filepath))
{
string line;
while ((line = sr.ReadLine()) != null)
{
lineCount++;
if (lineCount % 100 == 0 || sr.Peek().Equals(-1))
{
SaveToDB(list);
list = new List<string>();
}
}
}