在DPC++矢量加法中给出较大数组大小的随机出口代码

我正在尝试运行oneAPI的hello-world DPC++示例，该示例在CPU和GPU上添加了两个一维阵列，并验证结果。代码如下所示：

/*
DataParallel Addition of two Vectors
*/
#include <CL/sycl.hpp>
#include <array>
#include <iostream>
using namespace sycl;
constexpr size_t array_size = 100000;
typedef std::array<int, array_size> IntArray;
// Initialize array with the same value as its index
void InitializeArray(IntArray& a) { for (size_t i = 0; i < a.size(); i++) a[i] = i; }
/*
Create an asynchronous Exception Handler for sycl
*/
static auto exception_handler = [](cl::sycl::exception_list eList) {
for (std::exception_ptr const& e : eList) {
try {
std::rethrow_exception(e);
}
catch (std::exception const& e) {
std::cout << "Failure" << std::endl;
std::terminate();
}
}
};
void VectorAddParallel(queue &q, const IntArray& x, const IntArray& y, IntArray& parallel_sum) {
range<1> num_items{ x.size() };

buffer x_buf(x);
buffer y_buf(y);
buffer sum_buf(parallel_sum.data(), num_items);
/*
Submit a command group to the queue by a lambda
which contains data access permissions and device computation
*/
q.submit([&](handler& h) {
auto xa = x_buf.get_access<access::mode::read>(h);
auto ya = y_buf.get_access<access::mode::read>(h);
auto sa = sum_buf.get_access<access::mode::write>(h);
std::cout << "Adding on GPU (Parallel)n";
h.parallel_for(num_items, [=](id<1> i) { sa[i] = xa[i] + ya[i]; });
std::cout << "Done on GPU (Parallel)n";
});
/*
queue runs the kernel asynchronously. Once beyond the scope,
buffers' data is copied back to the host.
*/
}
int main() {
default_selector d_selector;
IntArray a, b, sequential, parallel;
InitializeArray(a);
InitializeArray(b);
try {
// Queue needs: Device and Exception handler
queue q(d_selector, exception_handler);

std::cout << "Accelerator: " 
<< q.get_device().get_info<info::device::name>() << "n";
std::cout << "Vector size: " << a.size() << "n";
VectorAddParallel(q, a, b, parallel);
}
catch (std::exception const& e) {
std::cout << "Exception while creating Queue. Terminating...n";
std::terminate();
}

/*
Do the sequential, which is supposed to be slow
*/
std::cout << "Adding on CPU (Scalar)n";
for (size_t i = 0; i < sequential.size(); i++) {
sequential[i] = a[i] + b[i];
}
std::cout << "Done on CPU (Scalar)n";

/*
Verify results, the old-school way
*/
for (size_t i = 0; i < parallel.size(); i++) {
if (parallel[i] != sequential[i]) {
std::cout << "Fail: " << parallel[i] << " != " << sequential[i] << std::endl;
std::cout << "Failed. Results do not match.n";
return -1;
}
}
std::cout << "Success!n";
return 0;
}

对于相对较小的array_size(我测试了100-50k个元素(，计算结果很好。样本输出：

Accelerator: Intel(R) Gen9
Vector size: 50000
Adding on GPU (Parallel)
Done on GPU (Parallel)
Adding on CPU (Scalar)
Done on CPU (Scalar)
Success!

可以注意到，在CPU和GPU上完成计算只需一秒钟。但当我增加array_size，也就是说，100000时，我得到了一个看似毫无头绪的错误：

C:Usersmyusersourcereposdpcpp-iotasx64Debugdpcpp-iotas.exe (process 24472) exited with code -1073741571.

虽然我不确定错误开始发生的确切值，但我似乎确信它发生在70000左右。我似乎不知道为什么会发生这种事，也不知道哪里出了问题？

事实证明，这是由于VS增强了堆栈大小。元素过多的连续数组导致堆栈溢出。

正如@user4581301所提到的，十六进制的错误代码-107374171给出了C00000FD，这是Visual Studio中"堆栈耗尽/溢出"的有符号表示。

解决方法：

在"项目属性"中将/STACK保留增加到高于1MB的值(这是默认值(>链接器>系统>堆栈保留/提交值
使用二进制编辑器(editbin.exe和dumpbin.exe(编辑/STACK:reserve
改为使用std::vector，这允许动态分配(由@Retired Ninja建议(

我在oneAPI中找不到更改/STACK的选项，这是Linker属性中的正常方式，如图所示。

我决定采用动态分配。

相关：https://stackoverflow.com/a/26311584/9230398

当我对大型应用程序进行编程时，我总是进行

ulimit -s unlimited

向shell解释我长大了，我真的想在堆栈上留出一些空间。

这是bash语法，但您显然可以适应其他一些shell。

我想可能有一个非UNIX操作系统的等价物？

相关内容

最新更新

热门标签：