我们正在使用python dliso库从.dlis文件(井日志文件)中提取数据。对于大多数公司的文件数据具有相同的结构,但其中一家公司的数据存储在嵌套的numpy数组中
正常如下图所示:
selected_curves_data[0:3]
Output
array([(172600., 1318.3775, 1130.0346, -1130.0301),
(172590., 1331.5 , 1130.0346, -1130.0301),
(172580., 1343.5 , 1130.046 , -1130.001 )],
dtype={'names':['TDEP','A','B','C'], 'formats':['<f4','<f4','<f4','<f4'], 'offsets':[4,8,12,16], 'titles':['T.CHANNEL-I.TDEP','T.CHANNEL-I.A','T.CHANNEL-I.B','T.CHANNEL-C'], 'itemsize':20})
我正在使用的具有不同的结构,每个值嵌套在两个子列表中,如下所示
selected_curves_data[0:3]
Output
array([([[6860. ]], [[7.887773]], [[65.23707 ]], [[83.41805]], [[98.60489 ]], [[76.93024]], [[305.9046]], [[ 1.435147 ]], [[0.]]),
([[6859.9]], [[7.594969]], [[65.16657 ]], [[83.31693]], [[98.35259 ]], [[76.18296]], [[305.8163]], [[-10.156202 ]], [[0.]]),
([[6859.8]], [[7.539917]], [[65.115074]], [[83.21918]], [[98.084015]], [[75.37859]], [[305.7146]], [[ 2.4681084]], [[0.]])],
dtype={'names':['DEPTH','A','B','C','D','E','F','G','H'], 'formats':[('<f8', (1, 1)),('<f4', (1, 1)),('<f4', (1, 1)),('<f4', (1, 1)),('<f4', (1, 1)),('<f4', (1, 1)),('<f4', (1, 1)),('<f4', (1, 1)),('<f4', (1, 1))], 'offsets':[4,12,16,20,24,62,66,1070,1074], 'titles':['T.CHANNEL-I.DEPTH','T.CHANNEL-I.A','T.CHANNEL-I.B','T.CHANNEL-I.C','T.CHANNEL-I.D','T.CHANNEL-I.E','T.CHANNEL-I.F','T.CHANNEL-I.G','T.CHANNEL-I.H'], 'itemsize':1078})
我尝试以下转换为所需的原始结构,但它不接受处理的数据,因为原始数据类型是numpy。Dtype [void]但我的是numpy.dtype[float64]
a = np.empty_like (selected_curves_data)
selected_curves_data[0:3]
ithRowList = []
i=0
for item in selected_curves_data:
if i < 3:
print("Type: ", type(item), " shape: ", item.shape)
print("----------------------------- Row ",i, " Original ----------------------------- n", item)
#ithRowList = np.empty_like(selected_curves_data)
for subItem in item:
ithRowList.append(float(subItem))
print("Type: ", type(ithRowList))
print("----------------------------- Row ",i, " After ----------------------------- n", ithRowList)
arr = np.array(ithRowList)
print("Type: ", type(arr), " shape: ", item.shape)
print("----------------------------- Row ",i, " Array ----------------------------- n", arr)
print("n")
np.append(a, ithRowList)
i+=1
ithRowList = []
a[0:3]
输出为
Type: <class 'numpy.void'> shape: ()
----------------------------- Row 0 Original -----------------------------
([[6860.]], [[7.887773]], [[65.23707]], [[83.41805]], [[98.60489]], [[76.93024]], [[305.9046]], [[1.435147]], [[0.]])
Type: <class 'list'>
----------------------------- Row 0 After -----------------------------
[6860.0, 7.887773036956787, 65.23706817626953, 83.41805267333984, 98.60488891601562, 76.93023681640625, 305.90460205078125, 1.4351470470428467, 0.0]
Type: <class 'numpy.ndarray'> shape: ()
----------------------------- Row 0 Array -----------------------------
[6.86000000e+03 7.88777304e+00 6.52370682e+01 8.34180527e+01
9.86048889e+01 7.69302368e+01 3.05904602e+02 1.43514705e+00
0.00000000e+00]
我很感激你的帮助所以现在它给了我以下错误:
TypeError: The DTypesobject
,否则它们不能存储在单个数组中。
所以我的目标是得到一个与原始数组结构/维度相同的最终数组,但只有值而没有它们嵌套数组。
所以我希望最终输出是这样的
(6860., 7.887773, 65.23707, 83.41805, 98.60489,76.93024, 305.9046, 1.435147, 0.)
这是因为进一步的处理将基于这个结构完成,我还希望在numpy数组
中保留列的详细信息。从dtype - make a target的第2个样式复制到第1个样式,逐个字段复制。
In [287]: dt1 = np.dtype([('A','f4'),('B','f4')])
In [288]: dt2 = np.dtype([('A','f4',(1,1)),('B','f4',(1,1))])
创建一个类型2的数组:
In [290]: x=np.zeros(3, dt2)
In [291]: x['A']=[[[1]],[[.2]],[[100]]]; x['B']=[[[.0]],[[2]],[[200]]]
In [292]: x
Out[292]:
array([([[ 1. ]], [[ 0.]]), ([[ 0.2]], [[ 2.]]),
([[100. ]], [[200.]])],
dtype=[('A', '<f4', (1, 1)), ('B', '<f4', (1, 1))])
目标和副本:
In [293]: y = np.zeros(x.shape, dt1)
In [294]: for name in dt2.names:
...: y[name]=np.squeeze(x[name])
...:
In [295]: y
Out[295]:
array([( 1. , 0.), ( 0.2, 2.), (100. , 200.)],
dtype=[('A', '<f4'), ('B', '<f4')])
我不得不使用squeeze
,因为x
的元素形状为(3,1,1),而y
的元素形状为(3,)