如何理解vtk二进制文件格式中的base64编码



我对如何理解二进制DataArray有问题。由于base64编码而出现问题。

手册上说,如果DataArray的格式是binary

The data are encoded in base64 and listed contiguously inside the
DataArray element. Data may also be compressed before encoding in base64. The byte-
order of the data matches that specified by the byte_order attribute of the VTKFile element.

我不能完全理解这一点,所以我获得了相同型号的ascii文件和二进制文件。

ASCII文件

<?xml version="1.0"?>
<VTKFile type="UnstructuredGrid" version="0.1" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
<UnstructuredGrid>
<Piece NumberOfPoints="4" NumberOfCells="1">
<PointData>
</PointData>
<CellData>
</CellData>
<Points>
<DataArray type="Float32" Name="Points" NumberOfComponents="3" format="ascii" RangeMin="0" RangeMax="1.4142135624">
0 0 0 1 0 0
1 1 0 0 1 1
</DataArray>
</Points>
<Cells>
<DataArray type="Int64" Name="connectivity" format="ascii" RangeMin="0" RangeMax="3">
0 1 2 3
</DataArray>
<DataArray type="Int64" Name="offsets" format="ascii" RangeMin="4" RangeMax="4">
4
</DataArray>
<DataArray type="UInt8" Name="types" format="ascii" RangeMin="10" RangeMax="10">
10
</DataArray>
</Cells>
</Piece>
</UnstructuredGrid>
</VTKFile>

二进制文件

<?xml version="1.0"?>
<VTKFile type="UnstructuredGrid" version="0.1" byte_order="LittleEndian" header_type="UInt32" compressor="vtkZLibDataCompressor">
<UnstructuredGrid>
<Piece NumberOfPoints="4" NumberOfCells="1">
<PointData>
</PointData>
<CellData>
</CellData>
<Points>
<DataArray type="Float32" Name="Points" NumberOfComponents="3" format="binary" RangeMin="0" RangeMax="1.4142135624">
AQAAAACAAAAwAAAAEQAAAA==eJxjYEAGDfaobEw+ADwjA7w=
</DataArray>
</Points>
<Cells>
<DataArray type="Int64" Name="connectivity" format="binary" RangeMin="0" RangeMax="3">
AQAAAACAAAAgAAAAEwAAAA==eJxjYIAARijNBKWZoTQAAHAABw==
</DataArray>
<DataArray type="Int64" Name="offsets" format="binary" RangeMin="4" RangeMax="4">
AQAAAACAAAAIAAAACwAAAA==eJxjYYAAAAAoAAU=
</DataArray>
<DataArray type="UInt8" Name="types" format="binary" RangeMin="10" RangeMax="10">
AQAAAACAAAABAAAACQAAAA==eJzjAgAACwAL
</DataArray>
</Cells>
</Piece>
</UnstructuredGrid>
</VTKFile>

当我查看DataArray时,以最后一个为例,我无法创建AQAAAACAAAABAAAACQAAAA==eJzjAgAACwAL10之间的关系。

我的理解可以用下面的代码来表达,但它获得了CggAAA==

#include "base64.h" // https://github.com/superwills/NibbleAndAHalf/blob/master/NibbleAndAHalf/base64.h
#include <iostream>
int main()
{
int x = 10;
int len;
// first arg: binary buffer
// second arg: length of binary buffer
// third arg: length of ascii buffer
char *ascii = base64((char *)&x, sizeof(int), &len);

std::cout << ascii << std::endl;
std::cout << len << std::endl;
free(ascii);
return 0;
}

有人能给我解释一下如何皈依吗?另一个相关主题可以在中看到

  • https://discourse.vtk.org/t/error-when-writing-binary-vtk-files/4487/7

感谢您抽出时间。

可以在讨论中找到解决方案。

https://discourse.vtk.org/t/how-to-understand-binary-dataarray-in-xml-vtk-output/4489

长的额外数据来自压缩器标头。

我已经找到了解决方案,并在VTK支持问题中写下了答案,但我在这里写它,以防有人来这里寻找与我们两个相同的问题。

注意,我用Python编程,但我相信C++中有base64zlib函数。此外,我使用numpy来定义数组,但我相信std::vector可以在C++中等效使用。

因此,假设我们要编写名为"float32"的单精度float32数组;点数";在你的例子中。如果我们假设一个标题类型为"0";UInt32";则在Python中,我们会这样做:

import numpy as np
import zlib
import base64
# write the float array.
arr = np.array([0, 0, 0, 1, 0, 0,
1, 1, 0, 0, 1, 1], dtype='float32')
# generate a zlib compressed array. This outputs a python byte type
arr_comp = zlib.compress(arr)
# generate the uncompressed header
header = np.array([ 1,  # apparently this is always the case, I think
2**15,  # from what I have read, this is true in general
arr.nbytes,  # the size of the array `arr` in bytes
len(arr_comp)],  # the size of the compressed array
dtype='uint32')  # because of header_type="UInt32"
# use base64 encoding when writing to file
# `.decode("utf-8")` transforms the python byte type to a string
print((base64.b64encode(header_arr) + base64.b64encode(arr_comp)).decode("utf-8"))

输出如预期:

AQAAAACAAAAwAAAAEQAAAA==eJxjYEAGDfaobEw+ADwjA7w=

根据zlib python文档,2**15是控制压缩数据时使用的历史缓冲区大小(或"窗口大小"(的参数。但不确定这意味着什么。。。


编辑:只有当数组的字节大小小于或等于2**15时,上述代码才有效。在VTK支持问题中,我已经针对数组较大的情况进行了扩展。你必须把它分成块。

最新更新