将二进制文件读入struct(翻译指令)



读取二进制文件和结构体对我来说是一个新的领域。

我理解如何在文件中读取并尝试了各种方法来读取原始数据,但似乎我需要使用struct。

我正在尝试将这些指令翻译成python代码:

二进制合并文件的开头包含一个用于各种通道的GWI_file_header_struct结构数组(在INET_INT.H文件中定义),后面是交错的32位浮点数据。报头中的前4个字节是以字节为单位的1个通道的报头长度(即516 = 0x0204)。要读取文件中存储的通道数,请读取第一个结构体的'channelsPerFile'字段(例如,查看有多少头文件)。在报头之后,数据以隔行形式保存,其中的点按及时获取的顺序存储。

主要的困惑是我如何将其转换为:

struct.unpack(...)

INET_INT.H结构:

typedef struct GWI_file_header_struct{  //  This struct is at the beginning of GWI iNet BINARY files that contain waves.
//
//  Macintosh:
//
//      file type:      'GWID'
//      creator type:   'ioNe'      NETWORK_DATA_CREATOR   

//  ----------------------------------
//  HEADER INFORMATION

iNetINT32 headerSizeInBytes;        //  contains length, in bytes, of this header (this does not include any data) { bytes 0..3, base 0 }
//  ----------------------------------
//  FILE INFORMATION

iNetINT32 int32key;                 //  32bit key that should contain 0x12345678 (this will help you make sure your byte lanes are ok). 
//  { bytes 4..7, base 0 }
iNetINT32 file_endian;              //  endian mode of stored data on disk: 0 = bigEndian_ion, 1 = littleEndian_ion
//  { bytes 8..11, base 0 }
iNetINT16 int16key;                 //  16bit key that should contain 0x55b4; (this field should consume 2 bytes
//   in the struct -- no padding) (i.e. INET_INT16_KEY = 0x55b4)
//  { bytes 12..13, base 0 }
iNetINT16 zero;                     //  set to 0 (this field should consume 2 bytes in the struct -- no padding)
//  { bytes 14..15, base 0 }
//  # of seconds since Jan 1, Midnight, 1904 that the acquisition started (this is used to compute the
//  date of acquisition). This overflows in 2030.
//  Strip Chart: 1st digitized point in entire stream (i.e. 1st pt of 1st scan)
//  Osc Mode:    1st point in current scan, secsSince1904_Int64 units 
//  { bytes 16..19, base 0 }
iNetUINT32 acquisition_SecsSince1904_FixedUint32_OverflowIn2030;

//  ----------------------------------
//  # OF POINTS STORED
//
//  This file contains a set of scans.  Each scan is 1 to .5billion points long.  For example,
//  we might have 100 scans, each 1000 points long. In this example:
//
//      pointsPerScanThisChannel_LSW = 1000
//      pointsPerScanThisChannel_MSW = 0
//
//      numScansStoredBeforeLastScan = 99
//
//      numPointsInLastPartialScan_LSW = 1000
//      numPointsInLastPartialScan_MSW = 0
//
//  Each channel can have a different number of points per scan due to the sampleRateChanMULTiplier
iNetUINT32 pointsPerScanThisChannel_LSW;    
iNetUINT32 pointsPerScanThisChannel_MSW;    
//  # points per scan =  (pointsPerScanThisChannel_MSW * 2^32) + pointsPerScanThisChannel_LSW
//  { bytes 20..23, base 0 }
//  { bytes 24..27, base 0 }
iNetUINT32 numScansStoredBeforeLastScan_LSW;            
//  # of complete scans stored in file 
//  { bytes 28..31, base 0 }
//  iNetUINT32 numScansStoredBeforeLastScan_MSW;    
//  this is defined below, at the end of the struct
iNetUINT32 numPointsInLastPartialScan_LSW;  
iNetUINT32 numPointsInLastPartialScan_MSW;  
//  # points stored in last scan if it is partially complete = (numPointsInLastPartialScan_MSW * 2^32) + numPointsInLastPartialScan_LSW
//  { bytes 32..35, base 0 }
//  { bytes 36..39, base 0 }
//  ----------------------------------
//  TIME INFORMATION
iNetFLT32 firstPoint_Time_Secs;     //  time of 1st point, units are seconds
//  { bytes 40..43, base 0 }
iNetFLT32 endUser_channel_samplePeriod_Secs;
//  time between points for this channel,
//  units are seconds.  Notice that channels
//  can have different sample rates, which
//  is the master_endUser_SampleRate / sampleRate_Divider,
//  where 'sampleRate_Divider' is an integer.
//  { bytes 44..47, base 0 }
//  ----------------------------------
//  TYPE OF DATA STORED
iNetINT32 arrayDataType;            //  Type of src array data. iNetDataType:
//
//  0   iNetDT_INT16:   16bit integer, signed
//  2   iNetDT_UINT16:  16bit integer, unsigned
//  3   iNetDT_INT32:   32bit integer, signed
//  4   iNetDT_UINT32:  32bit integer, unsigned
//  5   iNetDT_FLT32:   32bit float (IEEE flt32 format)
//  6   iNetDT_Double:  'double', as determined by the compiler
//                      (e.g. flt64, flt80, flt96, flt128)
//                      see 'bytesPerDataPoint' field to see
//                      how many bytes
//  { bytes 48..51, base 0 }

iNetINT32 bytesPerDataPoint;        //  # of bytes for each datapoint (e.g. 4 for 32bit signed integer)
//  { bytes 52..55, base 0 }
iNetStr31 verticalUnitsLabel;       //  pascal string of vertical units label (e.g. "Volts")
//  { bytes 56..87, base 0 }
iNetStr31 horizontalUnitsLabel;     //  horizontal units label, e.g. "Secs", pascal string (0th char is the # of valid chars)   
//  { bytes 88..119, base 0 }
iNetStr31 userName;                 //  user named set by user, e.g. "Pressure 1" , pascal string (0th char is the # of valid chars)   
//  { bytes 120..151, base 0 }
iNetStr31 chanName;                 //  name of channel, e.g. "Ch1 Vin+", pascal string (0th char is the # of valid chars)   
//  { bytes 152..183, base 0 }
//  ----------------------------------
//  DATA MAPPING
//
iNetINT32 minCode;                  //  if data is stored in integer format, this contains the mapping from integer 
iNetINT32 maxCode;                  //  to engineering units (e.g. +/-2048 A/D data is mapped to +/- 10V, minCode = -2048,
iNetFLT32 minEU;                    //  maxCode = +2047, minEU = -10.000, maxEU = +9.995.
iNetFLT32 maxEU;                    //  
//  { bytes 184..187, base 0 }
//  { bytes 188..191, base 0 }
//  { bytes 192..195, base 0 }
//  { bytes 196..199, base 0 }
//  ----------------------------------
//  iNet NETWORK ADDRESS (this does not need
//  to be filled in, 0L's are ok)
iNetINT32 netNum;                   //  channel network # (this pertains to iNet only; use 0 otherwise)
//  { bytes 200..203, base 0 }
iNetINT32 devNum;                   //  channel device # (this pertains to iNet only; use 0 otherwise)
//  { bytes 204..207, base 0 }
iNetINT32 modNum;                   //  channel module # (this pertains to iNet only; use 0 otherwise)
//  { bytes 208..211, base 0 }
iNetINT32 chNum;                    //  channel channel # (this pertains to iNet only; use 0 otherwise)
//  { bytes 212..215, base 0 }

//  ----------------------------------
//  END USER NOTES
iNetStr255 notes;                   //  pascal string that contains notes about the data stored.
//  { bytes 216..471, base 0 }
//  ----------------------------------
//  MAPPING
iNetFLT32 /* must remain flt32 */ internal1;    //  Mapping from internal engineering units (e.g. Volts) to external engineering                     
iNetFLT32 /* must remain flt32 */ external1;    //  units (e.g. mmHg).  This is used for 2 point linear mapping/calibration to  
iNetFLT32 /* must remain flt32 */ internal2;    //  a new, user defined, coordinate system.  instruNet World does not read these values
iNetFLT32 /* must remain flt32 */ external2;    //  from the wave files, yet instead reads them from the instrNet.prf file -- they
//  are only stored for the benefit of other software that might read this file. gsw 12/1/96
//  { bytes 472..475, base 0 }
//  { bytes 476..479, base 0 }
//  { bytes 480..483, base 0 }
//  { bytes 484..487, base 0 }
iNetFLT32 flt32key;                 //  flt32 key set to 1234.56 (i.e. INET_FLT32_KEY), Used to test floating point code. gsw 12/1/96
//  { bytes 488..491, base 0 }
iNetINT32 sampleRate_Divider;       //  this channel is digitized at the master_endUser_SampleRate divided 
//  this 'sampleRate_Divider' (i.e. sampleRateChanMULT_integerRatio_N_int64)
//  (helpful with FileType Binary Merge), gsw 1/29/97. Note: This field was introduced 1/29/97 and
//  files saved before that time set it to 0.
//  { bytes 492..495, base 0 }

iNetINT32 channelsPerFile;          //  # of channels per file (i.e. interlaced after array of headers) (helpful with FileType Binary Merge), gsw 1/29/97
//  Note: This field was introduced 1/29/97 and files saved before that time set it to 0.
//  { bytes 496..499, base 0 }

//  ----------------------------------
//  EXPANSION FIELDS
#if 1   //  gsw 12/23/09
//  # of complete scans stored in file, MS 32bits
//  { bytes 500..503, base 0 }
iNetUINT32 numScansStoredBeforeLastScan_MSW;    
#else
iNetINT32 expansion8;               //  expansion fields that are preset to 
#endif
iNetINT32 expansion9;               //  0 and then ignored
iNetINT32 expansion10;              //  { bytes 500..503, base 0 }
//  { bytes 504..507, base 0 }
//  { bytes 508..511, base 0 }
//  ----------------------------------
//  KEY TO TEST STRUCT PACKING
iNetINT32 int32key_StructTest;      //  32bit key that should contain 0x12345678; (i.e. INET_INT32_KEY)
//  { bytes 512..515, base 0 }

//  ----------------------------------
//  ACTUAL DATA
/* iNetFLT32 *data[1]; */           //  contains array of data of type 'arrayDataType'
} GWI_file_header_struct;

最终代码和结果:

<代码/em>

from struct import *
# Current 3 channels: Ch11 Vin+, Ch13 Vin+ and Ch15 Vin+
# Header info extracted using provided header struct (INET_INT.H)
# After the header, the data is saved in an interlaced form,
# where points are stored in the order that they are acquired in time.
# 3 channels: A[0], B[0], C[0], A[1], B[1], C[1]...
# After header = 516 header size x 3 channels = 1,548 bytes
# Start of data at 1,548 bytes?
with open(file, "rb") as f:
byte = f.read(12)
header_size, int32key, file_endian = unpack('<3i', byte)
# channel name 1
f.seek(152)
chan = f.read(183-152)
chan = struct.unpack("<31s", chan)[0].rstrip(b'x00').lstrip(b't')
# channel name 2
f.seek(152+header_size)
chan2 = f.read(183-152)
chan2 = struct.unpack("<31s", chan2)[0].rstrip(b'x00').lstrip(b't')
print(header_size, int32key, file_endian)
print("channel 1: {}".format(chan))
print("channel 2: {}".format(chan2))

结果

516 305419896 1
channel 1: b'Ch11 Vin+'
channel 2: b'Ch13 Vin+'

好吧,这不是一个完整的答案,但我觉得这里的评论真的很难读。

第一步是读取前12个字节(三个4字节的整数),并将它们解包,以便我们可以检查端序。我们先试试大端序

from struct import *
with open(file, "rb") as f:
byte = f.read(12)
header_size, int32key, file_endian = unpack('>3i', byte)

我们期望int32key设置为305419896 (= x12345678)。如果我们得到另一个值,那么让我们切换到little-endian,即将我们的unpack格式字符串更改为<3i

此时,我们可以读取头的其余部分,使用相同的逻辑,并获得读取第一个通道数据所需的所有信息。我希望这对你来说是一个好的开始。

最新更新