使用 std：：vector<double> 访问由 std：：unique_ptr<double[2] 管理的数据>

我有一个复杂的类，它包含一大块由智能指针管理的double[2]类型的数据，比如：std::unique_ptr<double[2]> m_data;我无法更改数据结构的类型。

我使用的库为我提供了一个具有以下签名的函数：bool func_in_lib(std::vector<double>& data, double& res)。我无法更改此函数的签名。

我想在不中断与复杂类的连接的情况下，将unique_ptr管理的数据传递给期望vector<double>&的函数。我希望函数直接在我的m_data上工作，而不是将数据复制到std::vector<double>中，然后将其复制回我的复杂类中，因为我必须多次这样做。

有办法做到这一点吗？

这里有一些代码，涵盖了我想要的语义。我关心的代码行是

vector<double> access_vec = /* give access to my_data via vector interface */;

#include <iostream>
#include <memory>
#include <vector>
using namespace std;
//--------------------------------------------------------------------------//
//--- This function is given, I cannot change its signature.
bool
func_in_lib(std::vector<double>& data, double& res) {
//--- check some properties of the vector
if (data.size() < 10)
return false;
//--- do something magical with the data
for (auto& d : data)
d *= 2.0;
res = 42.0;
return true;
}
//--------------------------------------------------------------------------//
struct DataType {
double a = 1.0;
double b = 2.0;
double c = 3.0;
};
//--------------------------------------------------------------------------//
ostream&
operator<<(ostream& out, const DataType& d) {
out << d.a << " " << d.b << " " << d.c << endl;
return out;
}
//--------------------------------------------------------------------------//
int
main(int argc, char const* argv[]) {
int count = 20;
//--- init and print my data
unique_ptr<DataType[]> my_data = make_unique<DataType[]>(count);
for (int i = 0; i < count; ++i)
cout << my_data.get()[i];
//---
double         result     = 0.0;
vector<double> access_vec = /* give access to my_data via vector interface */;
func_in_lib(access_vec, result);
return 0;
}

tl；博士：以符合标准的方式是不可能的

实际上，几乎是可能的，但std::allocator的限制阻碍了您的前进。让我解释一下。

std::vector"拥有"用于元素存储的内存：向量有权delete[]内存(例如销毁时、移动后销毁时、.resize()或push_back等)并重新分配到其他位置。如果你想保持unique_ptr的所有权，你不能允许这种情况发生。虽然func_in_lib()的模拟实现确实没有做到这一点，但您的代码不能做出这些假设，因为它必须满足函数的声明，而不是它的主体

但假设您愿意稍微改变规则，并假设向量在运行时不会替换其分配的内存。从某种意义上说，这是合理的——如果你能够以某种方式将内存传递给向量使用，并且它替换了内存区域，那么当func_in_lib()返回时，你可以检测到这一点，然后修复unique_ptr中的内容，或者抛出异常(取决于代码中的其他位置是否有指向丢弃内存的指针)。或者，让我们假设func_in_lib()采用了const std::vector<double[2]>&，而不是非常量引用。我们的路仍然会被堵住。为什么？

std::vector通过分配器对象管理内存。分配器是一个模板，所以理论上你可以使用一个向量，分配器可以做任何你想做的事情——例如，从unique_ptr::get()中预先分配的内存开始，拒绝重新分配任何内存，例如抛出异常。由于其中一个std::vector构造函数采用了适当类型的分配器，因此可以构造所需的分配器，用它创建一个向量，并传递对该向量的引用

但是，唉，你的图书馆太残忍了。func_in_lib没有模板化，只能为其分配器使用默认的模板参数：std::allocator。

用于std::vector和其他标准库容器的默认分配器是std::allocator。现在，在我看来，分配器通常是一个不正确的想法；但CCD_ 26特别令人讨厌。具体来说，它不能使用预先存在的内存区域来构建，以便使用；它只保存自己分配的内存，而不是你给它的内存

所以，你永远不会得到std::vector来使用你想要的内存。

那该怎么办呢

选项1：您的破解：
- 找出系统上std::vector的具体布局
- 手动将字段值设置为有用的值
- 对原始数据使用reinterpret_cast<std::vector>()
选项2:malloc()和free()挂钩(如果您在类似Unix的系统上和/或使用使用libc编译的系统)
- 参见：使用Malloc钩子
  
  这个想法是从您创建的std::vector中检测new[]调用，并为其提供自己的unique_ptr控制的内存，而不是实际分配任何东西。当矢量要求释放内存时(例如销毁时)，你什么也不做。
切换库。暴露func_in_lib的库写得不好。除非它是一个非常小众的图书馆，否则我相信还有更好的选择。事实上，peharps你可以自己写得更好。
不要在库中使用那个特定的函数；坚持使用库中较低级别的简单原语，并使用这些原语实现func_in_lib()。并不总是可行的，但可能值得一试。

与我的一位同事一起，我找到了两种解决方案，解决了我的问题。

解决方案1-棘手的

这个想法是使用std::vector<double>的底层实现的结构，在我的例子中，它由3个成员组成，其中包含指向向量数据的3个指针。

数据段的起始地址
数据段的结束地址
数据段当前最大容量的地址

所以我构建了一个包含这三个地址的struct，并使用reinterpret_cast到std::vector。这适用于我的机器上std::vector的当前实现。此实现可能会有所不同，具体取决于STL的安装版本。

这里的好处是，我可以使用std::vector的接口，而无需创建它。我也不必将数据复制到std::vector中。我还可以从存储在复杂类中的初始数据中提取一部分。我可以通过发送到结构的指针来控制被操纵的部分。

这解决了我的问题，但这是一个破解。我可以使用它，因为代码只与我自己相关。我仍然会发布它，因为它可能会引起其他人的兴趣。

#include <iostream>
#include <memory>
#include <vector>
using namespace std;
//--------------------------------------------------------------------------//
//--- This function is given, I cannot change its signature.
bool
func_in_lib(std::vector<double>& data, double& res) {
//--- check some properties of the vector
if (data.size() < 10)
return false;
//--- do something magical with the data
for (auto& d : data)
d *= 2.0;
res = 42.0;
return true;
}
//--------------------------------------------------------------------------//
struct DataType {
double a = 1.0;
double b = 2.0;
double c = 3.0;
};
//--------------------------------------------------------------------------//
ostream&
operator<<(ostream& out, const DataType& d) {
out << d.a << " " << d.b << " " << d.c << endl;
return out;
}
//--------------------------------------------------------------------------//
int
main(int argc, char const* argv[]) {
int count = 20;
//--- init and print my data
unique_ptr<DataType[]> my_data = make_unique<DataType[]>(count);
for (int i = 0; i < count; ++i)
cout << my_data.get()[i];

//--------------------------------------------------------------------------//
// HERE STARTS THE UGLY HACK, THAT CAN BE ERROR-PRONE BECAUSE IT DEPENDS ON
// THE UNDERLYING IMPLEMENTATION OF std::vector<T>
//--------------------------------------------------------------------------//
struct VecAccess {
double* start = nullptr; // address to the start of the data
double* stop0 = nullptr; // address to the end of the data
double* stop1 = nullptr; // address to the capacity of the vector
};
//---
DataType*       p_data = my_data.get();
VecAccess       va{ &(p_data[0].a),                //points at the 'front' of the vector
&(p_data[count - 1].c) + 1,    //points at the 'end' of the vector
&(p_data[count - 1].c) + 1 };
vector<double>* p_vec_access = reinterpret_cast<vector<double>*>(&va);
//--------------------------------------------------------------------------//
// HERE ENDS THE UGLY HACK.
//--------------------------------------------------------------------------//
//---
double dummy = 0.0;   // this is only relevant for the code used as minimum example
func_in_lib(*p_vec_access, dummy);
//--- print the modified data
for (int i = 0; i < count; ++i)
cout << my_data.get()[i];
return 0;
}

更新：分析第二个解决方案的汇编代码表明，即使没有调用数据对象的复制构造函数，也会执行内容的复制。复制过程发生在机器代码级别。

解决方案2-移动语义

对于这个解决方案，我必须用noexcept标记DataType的Move Constructor。关键思想是而不是将DataType[]阵列视为std::vector<double>。相反，我们将std::vector<double>视为std::vector<DataType>。我们可以然后将数据移动到此向量中(无需复制)，将其发送到函数，然后再把它移回去。

数据不是复制的，而是移动std::vector，速度更快。同样与我的案例相关的是，我可以再次从存储在复杂类中的初始数据中提取一部分。此解决方案的缺点是，我必须为移动了大小正确的数据

#include <iostream>
#include <memory>
#include <utility>
#include <vector>
using namespace std;
//--------------------------------------------------------------------------//
//--- This function is given, I cannot change its signature.
bool
func_in_lib(std::vector<double>& data, double& res) {
//--- check some properties of the vector
if (data.size() < 10)
return false;
//--- do something magical with the data
for (auto& d : data)
d *= 2.0;
res = 42.0;
return true;
}
//--------------------------------------------------------------------------//
class DataType {
public:
double a = 1.0;
double b = 2.0;
double c = 3.0;
// clang-format off
DataType() = default;
DataType(DataType const&) = default;
DataType(DataType&&) noexcept = default;
DataType& operator=(DataType const&) = default;
DataType& operator=(DataType&&) noexcept  = default;
~DataType()  = default;
// clang-format on
};
//--------------------------------------------------------------------------//
ostream&
operator<<(ostream& out, const DataType& d) {
out << d.a << " " << d.b << " " << d.c << endl;
return out;
}
//--------------------------------------------------------------------------//
int
main(int argc, char const* argv[]) {
int count = 20;
//--- init and print my data
unique_ptr<DataType[]> my_data = make_unique<DataType[]>(count);
for (int i = 0; i < count; ++i)
cout << my_data.get()[i];
//---
vector<double> double_vec;
double_vec.reserve(count * 3);
//--- here starts the magic stuff
auto& vec_as_datatype = *reinterpret_cast<vector<DataType>*>(&double_vec);
auto* start_mv        = &(my_data.get()[0]);
auto* stop_mv         = &(my_data.get()[count]) + 1;
//--- move the content to the vec
move(start_mv, stop_mv, back_inserter(vec_as_datatype));
//--- call the external func in the lib
double dummy = 0.0; // is only needed for the code of the example
func_in_lib(double_vec, dummy);
//--- move the content to back
move(begin(vec_as_datatype), end(vec_as_datatype), start_mv);
//--- print modified the data
for (int i = 0; i < count; ++i)
cout << my_data.get()[i];
}

这不是一个合理的答案，但没有人提到(因为它肯定不会直接回答你的问题)带有std:：pmr:：vector的C++17多态分配器，因为它们可以轻松地完成一半的工作。

但是不幸的是，不可能回到通常的std:：vector

我还看到了Bartek的编码博客上的一篇文章，我从中窃取了下面的代码片段：

#include <iostream>
#include <memory_resource>   // pmr core types
#include <vector>            // pmr::vector
#include <cctype>
template <typename T> void MyToUpper(T& vec)    {
for(auto & cr:vec)
cr = std::toupper(cr);
}
//https://www.bfilipek.com/2020/06/pmr-hacking.html
int main() {
char buffer[64] = {}; // a small buffer on the stack
std::fill_n(std::begin(buffer), std::size(buffer) - 1, '_');
std::cout << buffer << "nn";
std::pmr::monotonic_buffer_resource pool{std::data(buffer), std::size(buffer)};
std::pmr::vector<char> vec{ &pool };
for (char ch = 'a'; ch <= 'z'; ++ch)
vec.push_back(ch);

std::cout << buffer << "nn";

MyToUpper(vec);

std::cout << buffer << 'n';
}

在coliru(注：c++17)下具有潜在结果

_______________________________________________________________

aababcdabcdefghabcdefghijklmnopabcdefghijklmnopqrstuvwxyz______

aababcdabcdefghabcdefghijklmnopABCDEFGHIJKLMNOPQRSTUVWXYZ______

文章提到，垃圾部分(aababcdabcdefghabcdefghijklmnop)是由于增长过程中矢量数据的重新分配。

但这里有趣的是，对向量执行的操作确实是在原始缓冲区上完成的(abcdefghijklmnopqrstuvxyz＝>abcdefghijklmnopqrstuvwxyz)

不幸的是，std::pmr::vector不适合您的功能func_in_lib(std::vector<double>& data, double& res)

我认为您购买了这个库，无法访问代码，也无法重新编译它，但相反，您可以使用模板，或者只告诉您的提供商使用std:：pmr:：vector添加在其代码的开头。。。

tl；博士：以符合标准的方式是不可能的

那该怎么办呢

解决方案1-棘手的

解决方案2-移动语义

相关内容

最新更新

热门标签：