具有 >2 个暗角的单热编码数字数组



我有一个形状为(192, 224, 192, 1)的numpy数组。最后一个维度是我想对其进行热编码的整数类。例如,如果我有12个类,我希望得到的数组的为(192, 224, 192, 12),最后一个维度都是零,但在与原始值对应的索引处有一个1。

我可以用许多for循环天真地做到这一点,但我想知道是否有更好的方法。

如果您知道最大值,则可以在单个索引操作中执行此操作。给定数组am = a.max() + 1:

out = np.zeros(a.shape[:-1] + (m,), dtype=bool)
out[(*np.indices(a.shape[:-1], sparse=True), a[..., 0])] = True

如果你去掉不必要的拖尾尺寸,那就更容易了:

a = np.squeeze(a)
out = np.zeros(a.shape + (m,), bool)
out[(*np.indices(a.shape, sparse=True), a)] = True

索引中的显式元组是进行星展开所必需的。

如果您想将其扩展到任意维度,也可以这样做。下面将在axis处向压缩数组中插入一个新的维度。这里axis是新轴的最终阵列中的位置,它与np.stack一致,但与list.insert:不一致

def onehot(a, axis=-1, dtype=bool):
pos = axis if axis >= 0 else a.ndim + axis + 1
shape = list(a.shape)
shape.insert(pos, a.max() + 1)
out = np.zeros(shape, dtype)
ind = list(np.indices(a.shape, sparse=True))
ind.insert(pos, a)
out[tuple(ind)] = True
return out

如果你有一个单例维度需要扩展,那么广义解决方案可以找到第一个可用的单例维度:

def onehot2(a, axis=None, dtype=bool):
shape = np.array(a.shape)
if axis is None:
axis = (shape == 1).argmax()
if shape[axis] != 1:
raise ValueError(f'Dimension at {axis} is non-singleton')
shape[axis] = a.max() + 1
out = np.zeros(shape, dtype)
ind = list(np.indices(a.shape, sparse=True))
ind[axis] = a
out[tuple(ind)] = True
return out

要使用最后一个可用的单例,请将axis = (shape == 1).argmax()替换为

axis = a.ndim - 1 - (shape[::-1] == 1).argmax()

以下是一些示例用法:

>>> np.random.seed(0x111)
>>> x = np.random.randint(5, size=(3, 2))
>>> x
array([[2, 3],
[3, 1],
[4, 0]])
>>> a = onehot(x, axis=-1, dtype=int)
>>> a.shape
(3, 2, 5)
>>> a
array([[[0, 0, 1, 0, 0],    # 2
[0, 0, 0, 1, 0]],   # 3
[[0, 0, 0, 1, 0],    # 3
[0, 1, 0, 0, 0]],   # 1
[[0, 0, 0, 0, 1],    # 4
[1, 0, 0, 0, 0]]]   # 0
>>> b = onehot(x, axis=-2, dtype=int)
>>> b.shape
(3, 5, 2)
>>> b
array([[[0, 0],
[0, 0],
[1, 0],
[0, 1],
[0, 0]],
[[0, 0],
[0, 1],
[0, 0],
[1, 0],
[0, 0]],
[[0, 1],
[0, 0],
[0, 0],
[0, 0],
[1, 0]]])

onehot2要求您将要添加的维度标记为singleton:

>>> np.random.seed(0x111)
>>> y = np.random.randint(5, size=(3, 1, 2, 1))
>>> y
array([[[[2],
[3]]],
[[[3],
[1]]],
[[[4],
[0]]]])
>>> c = onehot2(y, axis=-1, dtype=int)
>>> c.shape
(3, 1, 2, 5)
>>> c
array([[[[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0]]],
[[[0, 0, 0, 1, 0],
[0, 1, 0, 0, 0]]],
[[[0, 0, 0, 0, 1],
[1, 0, 0, 0, 0]]]])
>>> d = onehot2(y, axis=-2, dtype=int)
ValueError: Dimension at -2 is non-singleton
>>> e = onehot2(y, dtype=int)
>>> e.shape
(3, 5, 2, 1)
>>> e.squeeze()
array([[[0, 0],
[0, 0],
[1, 0],
[0, 1],
[0, 0]],
[[0, 0],
[0, 1],
[0, 0],
[1, 0],
[0, 0]],
[[0, 1],
[0, 0],
[0, 0],
[0, 0],
[1, 0]]])

您可以创建一个新的零数组,并用高级索引填充它。

# sample array with 12 classes
np.random.seed(123)
a = np.random.randint(0, 12, (192, 224, 192, 1))
b = np.zeros((a.size, a.max() + 1))
# use advanced indexing to get one-hot encoding
b[np.arange(a.size), a.ravel()] = 1
# reshape to original form
b = b.reshape(a.shape[:-1] + (a.max() + 1,))
print(b.shape)
print(a[0, 0, 0])
print(b[0, 0, 0])

输出

(192, 224, 192, 12)
[2]
[0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

与此答案类似,但具有阵列重塑功能。

SciKit learn有一个编码器:

from sklearn.preprocessing import OneHotEncoder
# Data
values = np.array([1, 3, 2, 4, 1, 2, 1, 3, 5])
val_reshape = values.reshape(len(values), 1)
# One-hot encoding
oh = OneHotEncoder(sparse = False) 
oh_arr = oh.fit_transform(val_reshape)
print(oh_arr)
output: 
[[1. 0. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 0. 1. 0.]
[1. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[1. 0. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 0. 1.]]

最新更新