在python中导入.dat文件时不知道其结构



我正在尝试加载并查看可以从这里下载的数据的内容。之后我需要分析一下。在这方面,我已经提出了问题,但我没有得到任何解决方案。

现在,我浏览了他们位于这里的标签文件。其中提到

"将编写有用的基于Python的字母来描述每个对象
//请参阅http://docs.python.org/library/struct.html对于代码//,格式将以逗号分隔;RJW;作为键,则//{NAME}、{FORMAT}、{Numberofdims}、{SizeDim 1}、{Size Dim2}、//其中{FORMAT}是该类型的Python代码,即i对于uint32//,大小尺寸与数量一样多尺寸。">

所以,我想人们可以尝试python。我确实有python方面的工作知识。所以,我从这里开始了这个程序(为了简单起见,python文件和数据文件在同一个文件夹中(:

import numpy as np
data = np.genfromtxt('JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.dat')
print(data)

我得到错误“UnicodeDecodeError: 'cp949' codec can't decode byte 0xff in position 65: illegal multibyte sequence”.

如果我将代码更改为(如上所述(:

data=open('JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.DAT', encoding='utf-8')
print(data)

错误信息消失了,但我得到的只是:

<_io.TextIOWrapper name='JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.DAT' mode='r' encoding='utf-8'>

我在StackOverflow中检查了其他答案,但没有得到任何答案。我的问题可能与这里发布的内容非常相似

我需要首先看到这个dat文件的内容,然后导出到其他格式,比如.csv。

任何帮助都将不胜感激。。。

您需要以二进制模式打开文件。

with open('JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.DAT', 'rb') as f:
while True:
chunk = f.read(160036) # that is record size as per LBL file
# because the file is huge it will expect to hit Enter
# to display next chunk. Use Ctrl+C to interrupt
print(chunk)
input('Hit Enter...')

注意,您可以解析LBL文件,构造用于struct模块的格式字符串,并将每个块解析为有意义的字段。这就是你引用的评论所说的。

"""Example of reading NASA JUNO JADE CALIBRATED SCIENCE DATA
https://pds-ppi.igpp.ucla.edu/search/view/?f=yes&id=pds://PPI/JNO-J_SW-JAD-3-CALIBRATED-V1.0/DATA/2018/2018091/ELECTRONS/JAD_L30_LRS_ELC_ANY_CNT_2018091_V03&o=1
https://stackoverflow.com/a/66687113/4046632
"""
import struct
from functools import reduce
from operator import mul
from collections import namedtuple
__author__ = "Boyan Kolev, https://stackoverflow.com/users/4046632/buran"
with open('JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.LBL') as f:
rjws = [line.strip('n/* ') for line in f if line.startswith('/* RJW')]
# create the format string for struct
rjws = rjws[2:] # exclude first 2 RJW comments related to file itself
names = []
FMT = '='
print(f'Number of objects: {len(rjws)}')
for idx, rjw in enumerate(rjws):
_, name, fmt, num_dim, *dims = rjw.split(', ')
fstr = f'{reduce(mul, map(int, dims))}{fmt}'
FMT = f'{FMT} {fstr}'
names.append(name)
print(f'{idx}:{name}, {fstr}')
FMT = FMT.replace('c', 's') # for conveninece treat 21c as s char[]
print(f"Format string: {repr(FMT)}")
# parse DAT file
s = struct.Struct(FMT)
print(f'Struct size:{s.size}')
with open('JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.DAT', 'rb') as f:
n = 0
while True: # in python3.8+ this loop can be simplified with walrus operator
chunk = f.read(s.size)
if not chunk:
break
data = s.unpack_from(chunk)
# process data further, e.g. split data in 2D containers where appropriate
n += 1
print(f'Number of records: {n}')
# make a named tuple to represent first 10 fields
# for nice display. This basic use of namedtuple works only
# for first 23 objects, which have single item.
num_fields = 10
Record = namedtuple('Record', names[:num_fields])
record = Record(*data[:num_fields])
print('n----------------------n')
print(f'First {num_fields} fields of the last record.')
print(record)

输出:

Number of objects: 49
0:DIM0_UTC, 21c
1:PACKETID, 1B
2:DIM0_UTC_UPPER, 21c
--- omitted for sake of brevity ---
46:DIM2_AZIMUTH_DESPUN_LOWER, 3072f
47:MAG_VECTOR, 3f
48:ESENSOR, 1H
Format string: '= 21s 1B 21s 1b 21s 1b 1H 1B 1B 1B 1B 1h 1h 1f 1f 1f 1f 1f 1f 1f 1f 1f 1f 3f 3f 3f 1f 9f 9f 9f 1f 1I 1I 1H 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3f 1H'
Struct size:160036
Number of records: 1101
----------------------
First 10 fields of the last record.
Record(DIM0_UTC=b'2018-091T23:56:08.925', PACKETID=106, DIM0_UTC_UPPER=b'2018-092T00:01:08.925', PACKET_MODE=1, DIM0_UTC_LOWER=b'2018-091T23:51:08.925', PACKET_SPECIES=-1, ACCUMULATION_TIME=600, DATA_UNITS=2, SOURCE_BACKGROUND=3, SOURCE_DEAD_TIME=0)

链接到GutHub要点

最新更新