如何将2级嵌套numpy数组转换为数组



我们正在使用python dliso库从.dlis文件(井日志文件)中提取数据。对于大多数公司的文件数据具有相同的结构,但其中一家公司的数据存储在嵌套的numpy数组中

正常如下图所示:

selected_curves_data[0:3]
Output
array([(172600., 1318.3775, 1130.0346, -1130.0301),
(172590., 1331.5   , 1130.0346, -1130.0301),
(172580., 1343.5   , 1130.046 , -1130.001 )],
dtype={'names':['TDEP','A','B','C'], 'formats':['<f4','<f4','<f4','<f4'], 'offsets':[4,8,12,16], 'titles':['T.CHANNEL-I.TDEP','T.CHANNEL-I.A','T.CHANNEL-I.B','T.CHANNEL-C'], 'itemsize':20})

我正在使用的具有不同的结构,每个值嵌套在两个子列表中,如下所示

selected_curves_data[0:3]
Output
array([([[6860. ]], [[7.887773]], [[65.23707 ]], [[83.41805]], [[98.60489 ]], [[76.93024]], [[305.9046]], [[  1.435147 ]], [[0.]]),
([[6859.9]], [[7.594969]], [[65.16657 ]], [[83.31693]], [[98.35259 ]], [[76.18296]], [[305.8163]], [[-10.156202 ]], [[0.]]),
([[6859.8]], [[7.539917]], [[65.115074]], [[83.21918]], [[98.084015]], [[75.37859]], [[305.7146]], [[  2.4681084]], [[0.]])],
dtype={'names':['DEPTH','A','B','C','D','E','F','G','H'], 'formats':[('<f8', (1, 1)),('<f4', (1, 1)),('<f4', (1, 1)),('<f4', (1, 1)),('<f4', (1, 1)),('<f4', (1, 1)),('<f4', (1, 1)),('<f4', (1, 1)),('<f4', (1, 1))], 'offsets':[4,12,16,20,24,62,66,1070,1074], 'titles':['T.CHANNEL-I.DEPTH','T.CHANNEL-I.A','T.CHANNEL-I.B','T.CHANNEL-I.C','T.CHANNEL-I.D','T.CHANNEL-I.E','T.CHANNEL-I.F','T.CHANNEL-I.G','T.CHANNEL-I.H'], 'itemsize':1078}) 

我尝试以下转换为所需的原始结构,但它不接受处理的数据,因为原始数据类型是numpy。Dtype [void]但我的是numpy.dtype[float64]

a = np.empty_like (selected_curves_data)
selected_curves_data[0:3]
ithRowList = [] 
i=0
for item in selected_curves_data:
if i < 3:
print("Type: ", type(item), " shape: ", item.shape) 
print("----------------------------- Row ",i, " Original ----------------------------- n", item)
#ithRowList = np.empty_like(selected_curves_data)
for subItem in item:
ithRowList.append(float(subItem))

print("Type: ", type(ithRowList))
print("----------------------------- Row ",i, " After ----------------------------- n", ithRowList)

arr = np.array(ithRowList)
print("Type: ", type(arr), " shape: ", item.shape) 
print("----------------------------- Row ",i, " Array ----------------------------- n", arr)
print("n")

np.append(a, ithRowList)
i+=1
ithRowList = []


a[0:3]

输出为

Type:  <class 'numpy.void'>  shape:  ()
----------------------------- Row  0  Original ----------------------------- 
([[6860.]], [[7.887773]], [[65.23707]], [[83.41805]], [[98.60489]], [[76.93024]], [[305.9046]], [[1.435147]], [[0.]])
Type:  <class 'list'>
----------------------------- Row  0  After ----------------------------- 
[6860.0, 7.887773036956787, 65.23706817626953, 83.41805267333984, 98.60488891601562, 76.93023681640625, 305.90460205078125, 1.4351470470428467, 0.0]
Type:  <class 'numpy.ndarray'>  shape:  ()
----------------------------- Row  0  Array ----------------------------- 
[6.86000000e+03 7.88777304e+00 6.52370682e+01 8.34180527e+01
9.86048889e+01 7.69302368e+01 3.05904602e+02 1.43514705e+00
0.00000000e+00]
我很感激你的帮助所以现在它给了我以下错误:

TypeError: The DTypes类'numpy.dtype[void]'>没有共同的DType。例如,除非dtype为object,否则它们不能存储在单个数组中。

所以我的目标是得到一个与原始数组结构/维度相同的最终数组,但只有值而没有它们嵌套数组。

所以我希望最终输出是这样的

(6860., 7.887773, 65.23707, 83.41805, 98.60489,76.93024, 305.9046, 1.435147, 0.)

这是因为进一步的处理将基于这个结构完成,我还希望在numpy数组

中保留列的详细信息。

从dtype - make a target的第2个样式复制到第1个样式,逐个字段复制。

In [287]: dt1 = np.dtype([('A','f4'),('B','f4')])
In [288]: dt2 = np.dtype([('A','f4',(1,1)),('B','f4',(1,1))])

创建一个类型2的数组:

In [290]: x=np.zeros(3, dt2)
In [291]: x['A']=[[[1]],[[.2]],[[100]]]; x['B']=[[[.0]],[[2]],[[200]]]
In [292]: x
Out[292]: 
array([([[  1. ]], [[  0.]]), ([[  0.2]], [[  2.]]),
([[100. ]], [[200.]])],
dtype=[('A', '<f4', (1, 1)), ('B', '<f4', (1, 1))])

目标和副本:

In [293]: y = np.zeros(x.shape, dt1)
In [294]: for name in dt2.names:
...:     y[name]=np.squeeze(x[name])
...: 
In [295]: y
Out[295]: 
array([(  1. ,   0.), (  0.2,   2.), (100. , 200.)],
dtype=[('A', '<f4'), ('B', '<f4')])

我不得不使用squeeze,因为x的元素形状为(3,1,1),而y的元素形状为(3,)

最新更新