将多维std::vector转换为一个数组的最快方法

我想尽可能少地复制。目前，我正在使用num_t* array = new num_t[..]，然后在for循环中将多维向量的每个值复制到array中。

我想找到一个更好的方法来做这件事。

对于算术类型，可以使用函数memcpy。例如

#include <iostream>
#include <vector>
#include <cstring>
int main()
{
    std::vector<std::vector<int>> v =
    {
        { 1 },
        { 1, 2 },
        { 1, 2, 3 },
        { 1, 2, 3, 4 }
    };
    for ( const auto &row : v )
    {
        for ( int x : row ) std::cout << x << ' ';
        std::cout << std::endl;
    }
    std::cout << std::endl;
    size_t n = 0;
    for ( const auto &row : v ) n += row.size();
    int *a = new int[n];
    int *p = a;
    for ( const auto &row : v )
    {
        std::memcpy( p, row.data(), row.size() * sizeof( int ) );
        p += row.size();
    }        
    for ( p = a; p != a + n; ++p ) std::cout << *p << ' ';
    std::cout << std::endl;
    delete []a;
}

程序输出为

1 
1 2 
1 2 3 
1 2 3 4 
1 1 2 1 2 3 1 2 3 4

如您在注释中所述，vector<vector<T>>结构的内部向量大小相同。所以你实际上要做的是存储一个m x n矩阵。

通常这样的矩阵不是存储在多维结构中，而是存储在线性存储器中。然后，给定元素的位置（行、列）是基于最常用行主序和列主序的索引方案导出的。

由于您已经声明要将这些数据复制到GPU上，因此只需将线性矢量作为一个整体进行复制即可完成此复制。然后，您将在GPU和主机上使用相同的索引方案。

如果您正在使用CUDA，请查看Thrust。它提供了thrust::host_vector<T>和thrust::device_vector<T>，并进一步简化了复制：

thrust::host_vector<int> hostVec(100); // 10 x 10 matrix
thrust::device_vector<int> deviceVec = hostVec; // copies hostVec to GPU

相关内容

最新更新

热门标签：