来自数组和 np.dtype 对象的结构化数组

下面的代码使用 dtype 对象构造一个 NumPy 数组:

dt = np.dtype([
("index", np.int32),
("timestamp", np.int32),
("volume", np.float32)
arr = np.array([
[0, 20, 3],
[1, 21, 2],
[2, 23, 8],
[3, 26, 5],
[4, 31, 9]


>>> arr
array([[  0,  20, 334.],
[  1,  21, 254.],
[  2,  23, 823.],
[  3,  26, 521.],
[  4,  31, 943.]])
>>> arr[0]
array([  0,  20, 334.])


>>> arr
array([[(  0,   0,   0.), ( 20,  20,  20.), (334, 334, 334.)],
[(  1,   1,   1.), ( 21,  21,  21.), (254, 254, 254.)],
[(  2,   2,   2.), ( 23,  23,  23.), (823, 823, 823.)],
[(  3,   3,   3.), ( 26,  26,  26.), (521, 521, 521.)],
[(  4,   4,   4.), ( 31,  31,  31.), (943, 943, 943.)]],
dtype=[('index', '<i4'), ('timestamp', '<i4'), ('volume', '<f4')])
>>> arr[0]
array([(  0,   0,   0.), ( 20,  20,  20.), (334, 334, 334.)],
dtype=[('index', '<i4'), ('timestamp', '<i4'), ('volume', '<f4')])

为什么 NumPy 为每个数据类型创建每个值的版本,而不是将每个列映射到其自己的数据类型(并且仅此列(?我猜我在那里做错了什么。有没有办法达到我期望的结果?



np.fromiter(zip(*arr.T), dtype=dt)
array([(0, 20, 3.), (1, 21, 2.), (2, 23, 8.), (3, 26, 5.), (4, 31, 9.)],
dtype=[('index', '<i4'), ('timestamp', '<i4'), ('volume', '<f4')])

@hpaulj在注释中提到的另一种(鲜为人知的(方法是使用np.lib.recfunctions.unstructured_to_structured,它可用于直接从arr和 dtype 对象构造结构化数组:

np.lib.recfunctions.unstructured_to_structured(a, dt)
array([(0, 20, 3.), (1, 21, 2.), (2, 23, 8.), ..., (2, 23, 8.),
(3, 26, 5.), (4, 31, 9.)],
dtype=[('index', '<i4'), ('timestamp', '<i4'), ('volume', '<f4')])


names='index, timestamp, volume', 
formats = '<i4, <i4, <f4')
rec.array([(0, 20, 3.), (1, 21, 2.), (2, 23, 8.), (3, 26, 5.),
(4, 31, 9.)],
dtype=[('index', '<i4'), ('timestamp', '<i4'), ('volume', '<f4')])


names, dtypes = list(zip(*dt.descr))
names= ', '.join(names), 
formats = ', '.join(dtypes))


a = np.concatenate([arr]*1000, axis=0)
names='index, timestamp, volume', 
formats = '<i4, <i4, <f4')
# 57.9 µs ± 1.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.lib.recfunctions.unstructured_to_structured(a, dt)
# 79.6 µs ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit np.fromiter(zip(*a.T), dtype=dt)
#2.1 ms ± 69.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.fromiter(map(tuple, a), dtype=dt)
#6.34 ms ± 65.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.array(list(zip(*a.T)), dtype=dt)
# 2.17 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
