对每组相同的整数应用cumcount

假设我有以下升序整数数组(有些可能是负数(：

a = np.array([ 1,  1,  1,  1, 10, 10, 20, 20, 20, 30, 40, 40, 40, 40])

我想把它变成这样：

a = np.array([ 1,  2,  3,  4, 10, 11, 20, 21, 22, 30, 40, 41, 42, 43])

其中，每组相同整数中的每个整数都会递增，因此对于前1个：

  1 1 1 1  <--- these are the numbers from the array
+ 0 1 2 3  <--- these are counts of the number for its group
  -------
  1 2 3 4

有比下面更有效的方法吗？

a = np.array([ 1,  1,  1,  1, 10, 10, 20, 20, 20, 30, 40, 40, 40, 40])
ones = (a == np.pad(a, (1,0))[:-1]).astype(int)
ones[ones == 0] = -np.diff(np.concatenate(([0.], np.cumsum(ones != 0)[ones == 0])))
new_a = a + ones.cumsum()

注意数组将始终按升序(从低到高(排列，并且数字将始终是整数，有些数字可能是负数。

解释，如果您不理解：

事实上，在这篇文章的帮助下，我已经完成了这项工作。我现在正在做的是生成这样一个数组，其中0标记一组相同数字中的第一个，1标记其余数字：

1  1  1  1 10 10 20 20 20 30 40 40 40 40
0  1  1  1  0  1  0  1  1  0  0  1  1  1
^ first 1   ^ first 10     ^ first 30
                  ^ first 20  ^ first 40

然后使用上面链接的帖子中的技术来累计该数组中的所有内容：

# Shift `a` by one and compare it with the original array
>>> ones = (a == np.pad(a, (1,0))[:-1]).astype(int)
>>> ones
array([0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1])
# This line is from the linked post (modified, of course)
>>> ones[ones == 0] = -np.diff(np.concatenate(([0.], np.cumsum(ones != 0)[ones == 0])))
>>> ones
array([ 0,  1,  1,  1, -3,  1, -1,  1,  1, -2,  0,  1,  1,  1])
>>> ones.cumsum()
array([0, 1, 2, 3, 0, 1, 0, 1, 2, 0, 0, 1, 2, 3])

现在，我们可以将生成的数组添加到原始数组中：

>>> a
array([ 1,  1,  1,  1, 10, 10, 20, 20, 20, 30, 40, 40, 40, 40])
>>> a + ones.cumsum()
array([ 1,  2,  3,  4, 10, 11, 20, 21, 22, 30, 40, 41, 42, 43])

使用np.unique可能更优雅：

u, i = np.unique(a, return_index=True)   # Indices where the sums restart
b = np.ones_like(a)
b[i] = u
b[i[1:]] -= np.add.reduceat(b, i)[:-1]   # Subtract the sum of the prior region from the next
result = b.cumsum()

由于数组已经排序，您可以快捷方式访问np.unique:的该部分

i = np.r_[0, np.flatnonzero(np.diff(a)) + 1]  # Get the indices directly from the diff
b = np.ones_like(a)
b[i] = a[i]
b[i[1:]] -= np.add.reduceat(b, i)[:-1]
result = b.cumsum()

但是等等，每个区域的总和就是长度加上起始值减去1。这消除了两次求和的需要：

i = np.r_[0, np.flatnonzero(np.diff(a)) + 1]
b = np.ones_like(a)
b[i] = a[i]
b[i[1:]] -= np.diff(i) + a[i[:-1]] - 1  # Simpler way to sum the prior region
result = b.cumsum()

您可以进一步简化。假定a[i[k]]是运行的开始，则a[i[k] - 1]与a[i[k - 1]]相同。换句话说，上一次运行的开始与前一次运行中的最后一个元素相同：

d = np.diff(a)
i = np.r_[0, np.flatnonzero(d) + 1]
b = np.ones_like(a)
b[0] = a[0]
b[i[1:]] = d[i[1:] - 1] - np.diff(i) + 1 # Current region minus prior, reusing diff
result = b.cumsum()

最后两个版本中的任何一个都应该比您当前所做的更好。

上面的代码是为了简单快捷而编写的。如果你想让它更短、更难辨认，并且你使用的是Python 3.8+，你可以开始使用海象操作符：

i = np.r_[0, np.flatnonzero(d := np.diff(a)) + 1]
(b := np.ones_like(a))[0] = a[0]
b[i[1:]] = d[i[1:] - 1] - np.diff(i) + 1
result = b.cumsum()

由于海象从左到右进行评估，您可以创建最后一个闹剧：

(b := np.ones_like(a))[0] = a[0]
b[(i := np.r_[0, np.flatnonzero(d := np.diff(a)) + 1])[1:]] = d[i[1:] - 1] - np.diff(i) + 1
result = b.cumsum()

类似于其他方法：

(b := np.ones_like(a))[i := np.r_[0, np.flatnonzero(np.diff(a)) + 1]] = a[i]
b[i[1:]] -= np.diff(i) + a[i[:-1]] - 1
result = b.cumsum()

我不确定这是否非常有效，但这只是一句话：

np.hstack([x + np.r_[:x.size] for x in np.split(a, np.flatnonzero(np.diff(a))+1)])

相关内容

最新更新

热门标签：