什么会导致clSetKernelArg在一个有效的内存对象上给CL_INVALID_MEM_OBJECT ?



我正在用c++开发一个opencl项目,以便在我的显卡上运行物理模拟。正如标题所说,当我调用clSetKernelArg时,我得到CL_INVALID_MEM_OBJECT作为错误。我知道内存对象实际上是有效的,因为我可以在调用之前查询对象,我没有得到错误,缓冲区大小是正确的。另一件事,我认为它可能是,我正在创建上下文和设备对象在一个不同的线程,其中,虽然这是我应该修复的东西,我排除了这个问题的原因通过摆脱额外的线程。在打电话的时候,一切似乎都是有效的,我还没有找到另一个解释。以下是相关代码…

内核包装类…

//This sets a kernel arg of basic types and runs without error
template <typename T>
inline bool setKernelArg (uint32_t pos, const T* data)
{
int err = clSetKernelArg(m_kernel, pos, sizeof(T), data);
LOG_ERROR_IF("Could not set kernel argument! " + std::to_string(err), err != CL_SUCCESS );
return err != CL_SUCCESS;
}
//This sets a kernel arg for memorybuffers and gives the error
template<typename T>
inline bool setKernelMemBufferArg (uint32_t pos, const MemBuffer<T>* data)
{
int err = clSetKernelArg (m_kernel, pos, sizeof (cl_mem), data->getMemBuffer());
LOG_ERROR_IF ("Could not set kernel argument! " + std::to_string (err), err != CL_SUCCESS);
return err != CL_SUCCESS;
}

mem缓冲区的getter…

inline const cl_mem getMemBuffer() const {return m_memBuffer;}

和Calling方法…

void NBodySim::init ()
{
p_cmdq = std::make_shared<CommandQueue> (p_con.get (), p_device.get ());
p_program = std::unique_ptr<Program> (Program::createProgram ("src/Compute/Kernels/PhysKernel.cl",
                                                              *p_con, *p_device));
p_kern = std::unique_ptr<Kernel> (Kernel::createKernel (p_program.get (), p_cmdq, "calcPos"));
p_starMassBuffer = std::shared_ptr<MemBuffer<float>> (MemBuffer<float>::createMemBufferWithData (
p_con, p_cmdq, m_numStars, m_starMassArr,
BufferFlags::Read_Only | BufferFlags::Copy_Host_Ptr));
p_starPosInBuffer = std::shared_ptr<MemBuffer<cl_float3>> (MemBuffer<cl_float3>::createMemBufferWithData (
p_con, p_cmdq, m_numStars, m_starPosArr,
BufferFlags::Read_Only | BufferFlags::Copy_Host_Ptr));
p_starVelInBuffer = std::shared_ptr<MemBuffer<cl_float3>> (MemBuffer<cl_float3>::createMemBufferWithData (
p_con, p_cmdq, m_numStars, m_starVelArr,
BufferFlags::Read_Write | BufferFlags::Copy_Host_Ptr));
p_starPosOutBuffer = std::shared_ptr<MemBuffer<cl_float3>> (MemBuffer<cl_float3>::createMemBuffer (
p_con, p_cmdq, m_numStars,
BufferFlags::Read_Write));
p_starVelOutBuffer = std::shared_ptr<MemBuffer<cl_float3>> (MemBuffer<cl_float3>::createMemBuffer (
p_con, p_cmdq, m_numStars,
BufferFlags::Read_Write));

}
void NBodySim::run ()
{
m_simThread = std::thread([=]() 
{
init();
float softeningFactor = 100.0f;
float timeStep = 75000 * 3.154e+07; // 75,000 years in seconds 
std::shared_ptr<IO::StarFileMT> file (IO::StarFileMT::createFile (m_filePath, m_numStars, m_numTimesteps));
size_t globalWorkSize = m_numStars;
size_t localWorkSize = p_kern->getPreferredWorkGroupSize(p_device.get());
for ( int i = 0; i < m_numTimesteps; i++ )
{
p_kern->setKernelMemBufferArg(0, p_starPosInBuffer.get());
p_kern->setKernelMemBufferArg(1, p_starMassBuffer.get());
p_kern->setKernelArg<float>(2, &softeningFactor);
p_kern->setKernelArg<float>(3, &timeStep);
p_kern->setKernelMemBufferArg(4, p_starVelInBuffer.get());
p_kern->setKernelMemBufferArg(5, p_starPosOutBuffer.get());
p_kern->setKernelMemBufferArg(6, p_starVelOutBuffer.get ());

p_kern->runKernel(1, globalWorkSize, localWorkSize);
std::vector<cl_float3> result = p_starPosOutBuffer->pullFromBuffer(m_numStars);
file->writeTimeStep (Application::clFloatArrToVec3f (result.data(), m_numStars));
p_starPosInBuffer->copyFromBuffer (p_starPosOutBuffer.get (), m_numStars);
p_starVelInBuffer->copyFromBuffer(p_starVelOutBuffer.get(), m_numStars);
incrementTimestepsDone();
}
}
);
}

正如@jan-gerd在上面的评论中提到的,问题是内存对象被传递给clSetKernelArg而不是它的地址。但是没有必要编写返回void**的函数,您可以修改getter

inline const cl_mem& getMemBuffer() const {return m_memBuffer;}

clSetKernelArg

配合使用
int err = clSetKernelArg (m_kernel, pos, sizeof (cl_mem), &(data->getMemBuffer()));

或者你可以为它创建额外的getter,例如:

inline const cl_mem getMemBuffer() const {return m_memBuffer;}
inline const cl_mem& getMemBufferRef() const {return m_memBuffer;}

最新更新