我正在使用MPI和C进行编程,并且使用根秩从文件中读取数据,然后将其分发到其余的秩。我的MPI_Scatter工作正常,我打印出值以确保它们是正确的(而且确实如此(。我的问题是,在分配结构后,当尝试从根等级以外的其他等级访问它们时,我会出错。
pr_graph * graph = malloc(sizeof(*graph));
....
MPI_Scatter(verticesCountArray, 1, MPI_INT, &(graph->nvtxs), 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);
MPI_Scatter(edgesCountArray, 1, MPI_INT, &(graph->nedges), 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);
for(int rank = 0; rank<numProcesses; rank++){
if (rank == myrank){
fprintf(stderr, "%d %d n",graph->nvtxs, graph->nedges );
graph->xadj = malloc((graph->nvtxs + 1) * sizeof(*graph->xadj));
graph->nbrs = malloc(graph->nedges * sizeof(*graph->nbrs));
// graph->xadj[graph->nvtxs] = graph->nedges;
}
MPI_Barrier(MPI_COMM_WORLD);
}
我的输出是:
2 4
2 4
2 4
这是正确的。但是当我取消注释注释行时,我得到:
2 4
2 4
[phi01:07170] *** Process received signal ***
[phi01:07170] Signal: Segmentation fault (11)
[phi01:07170] Signal code: (128)
[phi01:07170] Failing at address: (nil)
[phi01:07170] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f5740503390]
[phi01:07170] [ 1] ./pagerank[0x401188]
[phi01:07170] [ 2] ./pagerank[0x400c73]
[phi01:07170] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f5740149830]
[phi01:07170] [ 4] ./pagerank[0x400ce9]
[phi01:07170] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 7170 on node phi01 exited on signal 11 (Segmentation fault).
这意味着只有等级 0 可以访问它分配的结构。谁能指出我为什么?谢谢!
编辑:
插入两个 recvbuffer 的任何值不会产生段错误并打印出正确的值。似乎错误的根源在于使用 MPI_Scatter((。
graph->nvtxs = 2;
graph->nedges = 4;
for(int rank = 0; rank<numProcesses; rank++){
if (rank == myrank){
fprintf(stderr, "%d %d n",graph->nvtxs, graph->nedges );
graph->xadj = malloc((graph->nvtxs + 1) * sizeof(*graph->xadj));
graph->nbrs = malloc(graph->nedges * sizeof(*graph->nbrs));
graph->xadj[graph->nvtxs] = graph->nedges;
}
MPI_Barrier(MPI_COMM_WORLD);
}
我找到了解决问题的方法。我会先发布它,然后尝试了解它为什么有效。
pr_int * nvtxs = malloc(sizeof(pr_int));
pr_int * nedges = malloc(sizeof(pr_int));
MPI_Scatter(verticesCountArray, 1, MPI_INT, &(nvtxs), 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);
MPI_Scatter(edgesCountArray, 1, MPI_INT, &(nedges), 1, MPI_UNSIGNED_LONG, 0, MPI_COMM_WORLD);
graph->nvtxs = nvtxs;
graph->nedges = nedges;
for(int rank = 0; rank<numProcesses; rank++){
if (rank == myrank){
fprintf(stderr, "%d %d n",graph->nvtxs, graph->nedges );
graph->xadj = malloc((graph->nvtxs + 1) * sizeof(*graph->xadj));
graph->nbrs = malloc(graph->nedges * sizeof(*graph->nbrs));
graph->xadj[graph->nvtxs] = graph->nedges;
}
MPI_Barrier(MPI_COMM_WORLD);
}
我想我没有使用实际的缓冲区(指针(来接收,只是常规变量。在调用 malloc 期间,它们可能已被转换为指针(地址值(,这就是为什么结构的大小可能很疯狂的原因。但是,我仍然不确定为什么我能够打印这些值,甚至不确定排名 0 是如何正常工作的。任何想法将不胜感激!谢谢!