opencl c99 和 c++ 之间的不同行为，具有相同的实时体素光线投射器实现

我正在与opencl合作开发体素光线投射引擎。我正在尝试做一些类似于Crassin的Gigavoxels的事情。在本文中，他们使用八叉树来存储体素数据。目前，我只是试图下降到八叉树内部，直到到达包含渲染数据的叶子。

我做了两个实现：一个在GPU上的OpenCl中，另一个在CPU上的C++中。我遇到的问题是，在 GPU 上，算法经过错误数量的级别，直到到达八叉树内的叶子。CPU 版本给出正确的结果。两个版本的算法相同，代码几乎相似。

你们知道可能是什么问题吗？可能是硬件问题，OpenCl问题还是我做错了什么？我在三个不同的 nVidia GPU 上遇到了相同的结果。

这是C++代码：

// Calculate actual ray stepping position
glm::vec4 pos = eyeRay_o + eyeRay_d * t;
uint offset = 0;
//check if root is leaf
uint leafFlag = GetLeafBit(octreeNodes[0]);
//get children address of root
uint childrenAddress = GetChildAddress(octreeNodes[0]);
while (iterations < 30) {  
    iterations++; 
    // Calculate subdivision offset
    offset = (uint)(pos.x * 2) + (uint)(pos.y * 2) * 2 + (uint)(pos.z * 2) * 4;
     
    if (leafFlag == 1) {
         //return some colour and exit the loop
         break;
    }
    else 
    {
         glm::uvec4 off = glm::uvec4(pos.x * 2, pos.y * 2, pos.z * 2, pos.w * 2);
         pos.x = 2 * pos.x - off.x;
         pos.y = 2 * pos.y - off.y;
         pos.z = 2 * pos.z - off.z;
         pos.w = 2 * pos.w - off.w;   
    }
    // Extract node data from the children
    finalAddress = childrenAddress + offset;    
    leafFlag = GetLeafBit(nodes[finalAddress]);
    childrenAddress = GetChildAddress(nodes[finalAddress]);
}

以下是 OpenCL 代码：

// Calculate actual ray stepping position
float4 position = rayOrigin + rayDirection * t;
uint offset = 0;
//check if root is leaf
uint leafFlag = extractOctreeNodeLeaf(octreeNodes[0]);
//get children address of root
uint childrenAddress = extractOctreeNodeAddress(octreeNodes[0]);
//position will be in the [0, 1] interval
//size of octree is 1
while (iterations < 30) {  
    iterations++; 
    //calculate the index of the next child based on the position in the current subdivision
    offset = (uint)(position.x * 2) + (uint)(position.y * 2) * 2 + (uint)(position.z * 2) * 4;
     
    if (leafFlag == 1) {
        //return some colour and exit the loop
        break;
    }
    else 
    {
         //transform the position inside the parent 
         //to the position inside the child subdivision
         //size of child will be considered to be 1
         uint4 off; 
         off.x = floor(position.x * 2);
         off.y = floor(position.y * 2);
         off.z = floor(position.z * 2);
         off.w = floor(position.w * 2);
         position = 2 * position - off;  
    }
     
    // Extract node data from the children
    finalAddress = childrenAddress + offset; 
    leafFlag = extractOctreeNodeLeaf(octreeNodes[finalAddress]);
    //each node has an index to an array of 8 children - the index points to the first child
    childrenAddress = extractOctreeNodeAddress(octreeNodes[finalAddress]);
}

以下是 extractOctreeNodeAddress，根据要求：

这两个函数都只执行一些位操作：

开放CL版本：

inline char extractOctreeNodeLeaf(uint value) {
 value = value >> 1;
 return value & 1;
}
inline uint extractOctreeNodeAddress(uint value) {
 return value >> 2;
}

C++版本：

inline byte GetLeafBit(uint value)
{
 value = value >> 0x1;
 return value & 0x1;
}
inline uint GetChildAddress(uint value)
{
 return value >> 0x2;
}

嗨，我发现了一些有趣的东西。我尝试手动测试不同的变量，在单个精确的像素和相机位置和方向上比较它们的 CPU 和 GPU 版本。在下面的代码中，如果我像现在一样运行程序，像素被打印为白色，并且值（与 CPU 实现相比> 5.5 是完全错误的），但是如果我注释最后一个 if 结构，并取消注释第一个，我得到的结果是红色......这对我来说有点无法解释。有什么想法吗？

if ((x == 265) && (y == 209)) {
    /*float epsilon = 0.01f;
    float4 stuff = (float4)(0.7604471f, 0.9088342f, 0.9999924f, 0);
    if(fabs(pos.x - stuff.x) < epsilon)  
        temp = (float4)(1, 0, 0, 1);
    else
        temp = (float4)(1, 1, 1, 1);
    break;*/
    if(pos.x > 5.5)
    {
        temp = (float4)(1, 1, 1, 1);
        break;
    }
}

主要问题是从 float4 隐式转换为 uint4。

逐个元素（仍然是隐式）进行强制转换解决了这个问题。

相关内容

最新更新

热门标签：