将Apache箭头表写入字符串c++



我试图写一个Apache箭头表的字符串。我的大例子有问题,我不能让这个小例子工作。这是在WriteTable调用中Arrow内部的段错误。我的大例子似乎没有正确序列化。

#include <arrow/api.h>
#include <arrow/io/memory.h>
#include <arrow/ipc/api.h>

std::shared_ptr<arrow::Table> makeSimpleFakeArrowTable() {
std::vector<std::shared_ptr<arrow::Field>> arrowFields;
arrowFields.emplace_back(std::make_shared<arrow::Field>("Field1", arrow::int64()));
arrowFields.emplace_back(std::make_shared<arrow::Field>("Field2", arrow::float64()));
auto schema = std::make_shared<arrow::Schema>(arrowFields);
std::vector<std::shared_ptr<arrow::Array>> columns(schema->num_fields());
arrow::Int64Builder longBuilder;
longBuilder.Append(20);
longBuilder.Finish(&(columns.at(0)));
arrow::DoubleBuilder doubleBuilder;
doubleBuilder.Append(10.0);
longBuilder.Finish(&(columns.at(1)));
return arrow::Table::Make(schema, columns);
}
std::shared_ptr<arrow::RecordBatch>
getArrowBatchFromBytes(const std::string& bytes) {
arrow::io::BufferReader arrowBufferReader{bytes};
auto streamReader =
arrow::ipc::RecordBatchStreamReader::Open(&arrowBufferReader).ValueOrDie();
auto batch = streamReader->Next().ValueOrDie();
return batch;
}

std::string arrowTableToByteString(const std::shared_ptr<arrow::Table>& table) {
auto stream = arrow::io::BufferOutputStream::Create().ValueOrDie();
auto batchWriter = arrow::ipc::MakeStreamWriter(stream, table->schema()).ValueOrDie();
auto status = batchWriter->WriteTable(*table);
if (not status.ok()) {
throw std::runtime_error(
"Couldn't write Arrow Table to byte string. Arrow status was: '" +
status.ToString() + "'.");
}
std::shared_ptr<arrow::Buffer> buffer = stream->Finish().ValueOrDie();
return buffer->ToHexString();
}
int main(int argc, char** argv) {
auto simpleFakeArrowTable = makeSimpleFakeArrowTable();
std::string tableAsByteString = arrowTableToByteString(simpleFakeArrowTable);
auto batch = getArrowBatchFromBytes(tableAsByteString);
assert(batch != nullptr);
}

我想到了两件事。首先,我认为这是一个打字错误:

longBuilder.Finish(&(columns.at(0)));
arrow::DoubleBuilder doubleBuilder;
doubleBuilder.Append(10.0);
longBuilder.Finish(&(columns.at(1))); // Shouldn't this be doubleBuilder?

当你自己创建一个箭头表时,调用arrow::Table::ValidateFull是一个好主意。这将有助于捕获这样的错误(在这种情况下,返回的状态将报告输入数组与模式不匹配)。

第二,如果我们修复,我们会得到一个错误,因为你返回buffer->ToHexString();,这将把你的字节数组变成一个十六进制字符串(例如,字节[10, 20, 30]变成字节[48, 48, 48, 65, 48, 48, 49, 52, 48, 48, 49, 69],更通常表示为000A0014001E)。

然后转过来并尝试将这些十六进制字节读取为表arrow::io::BufferReader arrowBufferReader{bytes};。如果我将ToHexString更改为ToString,则您的示例运行并返回0。

最新更新