使用istringstearm和get行缓慢读取文件



我正在尝试编写一个解析器来读取C++中的大文本文件。使用可读方法的类似 python 代码大约快 7 到 8 倍。

我想知道为什么它在C++中运行得这么慢。大多数时间都花在使用 istringstream 来解析行以分隔表号上。如果有人可以指出代码问题或 istringstream 的替代方案,那就太好了。代码如下:

'''

#include <fstream>
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <algorithm>
#include <chrono>

using namespace std::chrono;
int main()
{
auto start = high_resolution_clock::now();
std::ifstream inf{ "/Users/***/some.bed" };
std::istringstream iss;
int aprox_nlines = 7000000;


std::vector<int>* ptr_st = new std::vector<int>();
std::vector<int>& start_v = *ptr_st;
start_v.reserve(aprox_nlines);

std::vector<int>* ptr_en = new std::vector<int>();
std::vector<int>& end_v = *ptr_en;
end_v.reserve(aprox_nlines);

// If we couldn't open the output file stream for   reading
if (!inf)
{
// Print an error and exit
std::cerr << "Uh oh, File could not be opened for reading!" << std::endl;
return 1;
}

int count=0;
std::string line;
int sstart;
int end_val;
std::string val;

if (inf.is_open())
{
while (getline(inf, line))
{
count += 1;

iss.str(line);
iss >> val;
iss >> sstart;
start_v.push_back(sstart);
iss >> end_val;
end_v.push_back(end_val);
}
std::cout << count<<"n";

inf.close();
}
auto stop = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(stop - start);

std::cout << "Time taken by function: " << duration.count() << " microseconds" <<"n";



return 0;

}

'''

似乎使用 FILE * = fopen(( 它运行得更好。它比 istringstream 快 10 倍左右。与python内置(可读(功能相比,它的速度提高了33%。 '''

FILE * ifile = fopen("*/N.bed", "r");
size_t linesz = 60+1;
char * nline = new char[linesz];
char T[50], S[50];
int sn,en;
unsigned int i = 0;
while(getline(&nline, &linesz, ifile) > 0)  {
i++;
//std::cout<<nline<<"n";
sscanf(nline, "%s %d %d", T, &sn, &en);
start_v.push_back(sn);
end_v.push_back(en);
//std::cout<<T<<" "<< S <<"n";
}

'''

最新更新