如何在C++中有效地从.txt加载数据

我目前正在使用fstream用C++加载7.1GB的数据。.txt文件如下所示：

item1  2.87  4.64  ... 
item2  5.89  9.24  ... 
...     ...   ...  ...

它有300000行和201列(项目名称为1列，权重为200列(，每个单元格都有一个双类型编号。我现在做的是这样的：

ifstream click_log(R"(1.txt)", ifstream::in);
string line;
unordered_map<string, vector<double>> dict;
while (getline(click_log, line)){
istringstream record(line);
string key;
vector<double> weights;
double weight;
record >> key;
while (record >> weight){
weights.push_back(weight);
}
dict[key] = weights;
}

然而，我的电脑(AMD 3700X，8核(大约需要30分钟才能完全加载文件。它之所以慢是因为它的O(m*n(复杂性，还是可能只是因为将字符串转换为双精度很慢？从.txt加载数据最有效的方法是什么？

您不应该在每次循环迭代中重新创建变量。一次性创建它们，然后您可以在需要时重新分配它们。

如果您想使用std::vector而不是std::array<double, 200>，那么您应该使用reserve(200)所有向量，以避免由于std::vector的机制而导致大量的重新分配/拷贝/解除分配。

您可以对std::unordered_map执行同样的操作。

最后，将数据直接写入目标容器，您不需要使用那么多临时性(这将消除所有这些不必要的副本造成的巨大开销(。

考虑到这些指导原则，我已经重写了您的代码。我打赌这会提高你的表现：

int main()
{
std::ifstream ifs("..\tests\data\some_data.txt"); // Replace with your file
if(!ifs)
return -1;

std::unordered_map<std::string, std::array<double, 200>> dict;
dict.reserve(300000);

std::string line;
std::string key;
double weight;
std::size_t i;

while(getline(ifs, line))
{
std::istringstream record(line);
i = 0;

record >> key;

while(record >> weight)
{
dict[key].at(i++) = weight;
}
}
ifs.close();
// The whole file is loaded
return 0;
}

当然，我并不认为这是最有效的方法。我相信我们可以带来更多我当时没有想到的改进。

无论如何，请记住，您可能仍然会遇到硬盘访问、IO操作等方面的瓶颈，。。。

相关内容

最新更新

热门标签：