我正在尝试使用本地指针访问当前线程具有亲和力的内存。
不幸的是,我的局部指针似乎没有指向我认为它们应该指向的地方。
有人知道出了什么问题吗?
编辑:我忘了说,下面的输出是在运行这段代码时用四个线程生成的,即THREADS = 4
。
#include <upc.h>
#include <stdio.h>
#include <stdlib.h>
int main(){
shared int * T = (shared int *) upc_all_alloc(12, sizeof(int));
if(!T)
upc_global_exit(-1);
int i;
upc_forall(i=0; i<12; i++; &T[i]) T[i] = i;
upc_barrier;
if(MYTHREAD == 0)
for(i=0; i<12; i++) printf("thread %d, T[%d] = %dn", MYTHREAD, i, T[i]);
upc_barrier;
int my_start = (12/THREADS + 1)*MYTHREAD;
int my_end = (12/THREADS + 1)*(MYTHREAD+1) - 1;
int* T_local = (int*)&T[my_start];
for(i=my_start; i<=my_end; i++)
printf("thread %d, T_local[%d] = %d, T[%d] = %dn", MYTHREAD,
i-my_start, T_local[i-my_start], i, T[i]);
upc_barrier;
return 0;
}
输出(THREADS = 4
):
thread 0, T[0] = 0
thread 0, T[1] = 1
thread 0, T[2] = 2
thread 0, T[3] = 3
thread 0, T[4] = 4
thread 0, T[5] = 5
thread 0, T[6] = 6
thread 0, T[7] = 7
thread 0, T[8] = 8
thread 0, T[9] = 9
thread 0, T[10] = 10
thread 0, T[11] = 11
thread 0, T_local[0] = 0, T[0] = 0
thread 0, T_local[1] = 4, T[1] = 1
thread 0, T_local[2] = 8, T[2] = 2
thread 0, T_local[3] = 0, T[3] = 3
thread 1, T_local[0] = 4, T[4] = 4
thread 1, T_local[1] = 8, T[5] = 5
thread 1, T_local[2] = 0, T[6] = 6
thread 2, T_local[0] = 8, T[8] = 8
thread 2, T_local[1] = 0, T[9] = 9
thread 2, T_local[2] = 0, T[10] = 10
thread 2, T_local[3] = 0, T[11] = 11
thread 3, T_local[0] = 0, T[12] = 0
thread 3, T_local[1] = 0, T[13] = 0
thread 3, T_local[2] = 0, T[14] = 0
thread 3, T_local[3] = 0, T[15] = 0
thread 1, T_local[3] = 0, T[7] = 7
你的数组T是用循环布局(即blocksize == 1)分配和声明的。这意味着第一个与MYTHREAD有亲缘关系的元素就是T[MYTHREAD]。因此,您应该像下面这样初始化指向local的指针:
int* T_local = (int*)&T[MYTHREAD];
在循环布局中,共享元素以循环方式传递给线程,这意味着每个线程都有一个不连续的分布式数组元素块。例如,对于4个线程,线程0将与T[0], T[4]和T[8]有亲缘关系。线程0上正确初始化的指向local的T_local指针将访问共享数组的本地切片中的这些元素(分别为T_local[0], T_local[1]和T_local[2])。
你对my_start和my_end的计算似乎假设了一个不同的(更大的)阻塞因子,而不是T实际使用的,这可能是你困惑的根源。