opencl c99 和 c++ 之间的不同行为,具有相同的实时体素光线投射器实现



我正在与opencl合作开发体素光线投射引擎。我正在尝试做一些类似于Crassin的Gigavoxels的事情。在本文中,他们使用八叉树来存储体素数据。目前,我只是试图下降到八叉树内部,直到到达包含渲染数据的叶子。

我做了两个实现:一个在GPU上的OpenCl中,另一个在CPU上的C++中。我遇到的问题是,在 GPU 上,算法经过错误数量的级别,直到到达八叉树内的叶子。CPU 版本给出正确的结果。两个版本的算法相同,代码几乎相似。

你们知道可能是什么问题吗?可能是硬件问题,OpenCl问题还是我做错了什么?我在三个不同的 nVidia GPU 上遇到了相同的结果。

这是C++代码:

// Calculate actual ray stepping position
glm::vec4 pos = eyeRay_o + eyeRay_d * t;
uint offset = 0;
//check if root is leaf
uint leafFlag = GetLeafBit(octreeNodes[0]);
//get children address of root
uint childrenAddress = GetChildAddress(octreeNodes[0]);
while (iterations < 30) {  
    iterations++; 
    // Calculate subdivision offset
    offset = (uint)(pos.x * 2) + (uint)(pos.y * 2) * 2 + (uint)(pos.z * 2) * 4;
     
    if (leafFlag == 1) {
         //return some colour and exit the loop
         break;
    }
    else 
    {
         glm::uvec4 off = glm::uvec4(pos.x * 2, pos.y * 2, pos.z * 2, pos.w * 2);
         pos.x = 2 * pos.x - off.x;
         pos.y = 2 * pos.y - off.y;
         pos.z = 2 * pos.z - off.z;
         pos.w = 2 * pos.w - off.w;   
    }
    // Extract node data from the children
    finalAddress = childrenAddress + offset;    
    leafFlag = GetLeafBit(nodes[finalAddress]);
    childrenAddress = GetChildAddress(nodes[finalAddress]);
}   

以下是 OpenCL 代码:

// Calculate actual ray stepping position
float4 position = rayOrigin + rayDirection * t;
uint offset = 0;
//check if root is leaf
uint leafFlag = extractOctreeNodeLeaf(octreeNodes[0]);
//get children address of root
uint childrenAddress = extractOctreeNodeAddress(octreeNodes[0]);
//position will be in the [0, 1] interval
//size of octree is 1
while (iterations < 30) {  
    iterations++; 
    //calculate the index of the next child based on the position in the current subdivision
    offset = (uint)(position.x * 2) + (uint)(position.y * 2) * 2 + (uint)(position.z * 2) * 4;
     
    if (leafFlag == 1) {
        //return some colour and exit the loop
        break;
    }
    else 
    {
         //transform the position inside the parent 
         //to the position inside the child subdivision
         //size of child will be considered to be 1
         uint4 off; 
         off.x = floor(position.x * 2);
         off.y = floor(position.y * 2);
         off.z = floor(position.z * 2);
         off.w = floor(position.w * 2);
         position = 2 * position - off;  
    }
     
    // Extract node data from the children
    finalAddress = childrenAddress + offset; 
    leafFlag = extractOctreeNodeLeaf(octreeNodes[finalAddress]);
    //each node has an index to an array of 8 children - the index points to the first child
    childrenAddress = extractOctreeNodeAddress(octreeNodes[finalAddress]);
}

以下是 extractOctreeNodeAddress,根据要求:

这两个函数都只执行一些位操作:

开放CL版本:

inline char extractOctreeNodeLeaf(uint value) {
 value = value >> 1;
 return value & 1;
}
inline uint extractOctreeNodeAddress(uint value) {
 return value >> 2;
}

C++版本:

inline byte GetLeafBit(uint value)
{
 value = value >> 0x1;
 return value & 0x1;
}
inline uint GetChildAddress(uint value)
{
 return value >> 0x2;
}

嗨,我发现了一些有趣的东西。我尝试手动测试不同的变量,在单个精确的像素和相机位置和方向上比较它们的 CPU 和 GPU 版本。在下面的代码中,如果我像现在一样运行程序,像素被打印为白色,并且值(与 CPU 实现相比> 5.5 是完全错误的),但是如果我注释最后一个 if 结构,并取消注释第一个,我得到的结果是红色......这对我来说有点无法解释。有什么想法吗?

if ((x == 265) && (y == 209)) {
    /*float epsilon = 0.01f;
    float4 stuff = (float4)(0.7604471f, 0.9088342f, 0.9999924f, 0);
    if(fabs(pos.x - stuff.x) < epsilon)  
        temp = (float4)(1, 0, 0, 1);
    else
        temp = (float4)(1, 1, 1, 1);
    break;*/
    if(pos.x > 5.5)
    {
        temp = (float4)(1, 1, 1, 1);
        break;
    }
}

主要问题是从 float4 隐式转换为 uint4。

逐个元素(仍然是隐式)进行强制转换解决了这个问题。

最新更新