OpenMP-通过单个线程访问单个文件

我有4个核心==4个线程(omp_get_max_threads()(，所以我创建了4个部分(OMP部分(，将一个巨大的文件分成4个子分区，然后将它们提供给4个不同的线程，并在每个线程中进行进一步的处理(有其他方法，但我想尝试一下(。在每个部分中，我独立地打开了相同的大文件，假设每个部分都有自己独立的文件描述符。问题是，在4个分区中，只有3个分区正在写入其3个分区文件，其中1个分区根本没有写入一个字节。这种情况在不同的跑步中随机发生在任何路段。

我还尝试在不同的节之间共享相同的文件指针，但问题仍然存在。此外，性能下降，所以我在每个部分都制作了文件指针。

#include<iostream>
#include<omp.h>
#include<fstream>
#include<string>
#include<sstream>
using namespace std;
int main()
{
int no_of_threads = omp_get_max_threads();
int partition_size = 1000000000/no_of_threads;
long double word_length_counter;
string temp_word_holder;
word_length_counter=0;

cout<<"nno of threads "<<no_of_threads;
#pragma omp parallel private(word_length_counter,temp_word_holder) shared(partition_size)
{
#pragma omp sections
{
#pragma omp section
{
ifstream main_file;
ofstream temp_file_holder;
main_file.open("generated.txt",ios::in);
temp_file_holder.open("partition0.txt",ios::out);
while(word_length_counter <= partition_size)
{
main_file>>temp_word_holder; 
word_length_counter += temp_word_holder.length();
temp_file_holder<<temp_word_holder<<endl;
}
main_file.close();
temp_file_holder.close();
}
#pragma omp section
{
ifstream main_file1;
ofstream temp_file_holder1;
main_file1.open("generated.txt",ios::in);
temp_file_holder1.open("partition1.txt",ios::out);
main_file1.seekg(partition_size);
while(word_length_counter <= partition_size)
{
main_file1>>temp_word_holder; 
word_length_counter += temp_word_holder.length();
temp_file_holder1<<temp_word_holder<<endl;
}
main_file1.close();
temp_file_holder1.close();
}
#pragma omp section
{
ifstream main_file2;
ofstream temp_file_holder2;
main_file2.open("generated.txt",ios::in);
temp_file_holder2.open("partition2.txt",ios::out);
main_file2.seekg((partition_size*2));
while(word_length_counter <= partition_size)
{
main_file2>>temp_word_holder; 
word_length_counter += temp_word_holder.length();
temp_file_holder2<<temp_word_holder<<endl;
}
main_file2.close();
temp_file_holder2.close();
}
#pragma omp section
{
ifstream main_file3;
ofstream temp_file_holder3;
main_file3.open("generated.txt",ios::in);
temp_file_holder3.open("partition3.txt",ios::out);
main_file3.seekg((partition_size-1)*3);
while(word_length_counter <= partition_size && !main_file3.eof())
{
main_file3>>temp_word_holder; 
word_length_counter += temp_word_holder.length();
temp_file_holder3<<temp_word_holder<<endl;
}
main_file3.close();
temp_file_holder3.close();
}
}

}
#pragma omp barrier
cout<<"npartitions generated";
}

罪魁祸首是word_length_counter是private，并且没有正确初始化。对于每个私有变量，都会创建一个线程私有副本，并且"。。。已初始化，或具有未定义的初始值，就好像它是在没有初始化器的情况下本地声明的一样(来自OpenMP规范第2.21.3节(。在某些线程中，该值可能大于partition_size，因此while循环永远不会执行。这里有一个简单的代码来重现效果：

#include <cstdio>
#include <omp.h>
using namespace std;
int main()
{
long double word_length_counter;
word_length_counter=0;

#pragma omp parallel private(word_length_counter)
{
printf("%d %Lfn", omp_get_thread_num(), word_length_counter);
}
}

运行代码：

$ clang++ -fopenmp -o foo foo.cpp
$ OMP_NUM_THREADS=4 ./foo
0 nan
1 nan
2 -nan
3 nan
$ OMP_NUM_THREADS=4 ./foo
1 nan
2 -nan
0 nan
3 nan
$ OMP_NUM_THREADS=4 ./foo
0 nan
1 nan
2 -nan
3 nan
$ OMP_NUM_THREADS=4 ./foo
2 nan
0 nan
3 nan
1 nan

正如你所看到的，对于我的Clang的特定版本和4个线程，除了线程2，所有线程的初始值都倾向于-NaN。这些是随机值——无论过去在分配线程堆栈的内存中是什么——而这些特定的值是系统假象。

一个简单的解决方案是将初始化word_length_counter=0;移动到平行区域内或替换

private(word_length_counter,temp_word_holder)

带有

firstprivate(word_length_counter) private(temp_word_holder)

firstprivate(X)用程序遇到并行区域之前原始变量的值初始化X的私有值。

由于您不在并行区域之外使用word_length_counter，因此良好的编程实践是将整个声明移动到内部。

相关内容

最新更新

热门标签：