比较numpy数组中的多列

我有一个2D numpy数组，大约有12列和1000多行，每个单元格包含一个从1到5的数字。我正在根据我的点系统搜索最好的列的六元组，其中1和2产生-1点，4和5产生+1。

例如，如果某个六元组中的一行包含[1，4，5，3，4，3]，则该行的点应该是+2，因为3*1+1*（-1）=2。下一行可以是[1，2，2，3，3，3]，并且应该是-3点。

起初，我尝试了一个strait forward循环解决方案，但我意识到有665280种可能的列组合需要比较，当我还需要搜索最佳的五元组、四元组等时，循环将永远耗时。

有没有一种更聪明、更愚蠢的方法来解决我的问题？

import numpy as np
import itertools
N_rows = 10
arr = np.random.random_integers(5, size=(N_rows,12))
x = np.array([0,-1,-1,0,1,1])
y = x[arr]
print(y)
score, best_sextuple = max((y[:,cols].sum(), cols)
                           for cols in itertools.combinations(range(12),6))
print('''
score: {s}
sextuple: {c}
'''.format(s = score, c = best_sextuple))

例如，

score: 6
sextuple: (0, 1, 5, 8, 10, 11)

解释：

首先，让我们生成一个随机示例，包含12列和10行：

N_rows = 10
arr = np.random.random_integers(5, size=(N_rows,12))

现在我们可以使用numpy索引来转换arr、2、…、，。。。，5到值-1,0,1（根据您的评分系统）：

x = np.array([0,-1,-1,0,1,1])
y = x[arr]

接下来，让我们使用itertools.combinations生成6列的所有可能组合：

for cols in itertools.combinations(range(12),6)

和

y[:,cols].sum()

然后给出列的选择（六元组）cols的分数。

最后，使用max挑选出得分最好的六元组：

score, best_sextuple = max((y[:,cols].sum(), cols)
                           for cols in itertools.combinations(range(12),6))

import numpy
A = numpy.random.randint(1, 6, size=(1000, 12))
points = -1*(A == 1) + -1*(A == 2) + 1*(A == 4) + 1*(A == 5)
columnsums = numpy.sum(points, 0)
def best6(row):
    return numpy.argsort(row)[-6:]
bestcolumns = best6(columnsums)
allbestcolumns = map(best6, points)

bestcolumns现在将按升序包含最好的6列。根据类似的逻辑，allbestcolumns将在每行中包含最好的六列。

扩展上面Undepu的较长答案，可以自动生成分数的掩码数组。由于每次通过循环时，值的分数都是一致的，因此每个值的分数只需要计算一次。以下是在应用分数之前和之后，在示例6x10数组上执行此操作的稍微不雅的方法。

>>> import numpy
>>> values = numpy.random.randint(6, size=(6,10))
>>> values
array([[4, 5, 1, 2, 1, 4, 0, 1, 0, 4],
       [2, 5, 2, 2, 3, 1, 3, 5, 3, 1],
       [3, 3, 5, 4, 2, 1, 4, 0, 0, 1],
       [2, 4, 0, 0, 4, 1, 4, 0, 1, 0],
       [0, 4, 1, 2, 0, 3, 3, 5, 0, 1],
       [2, 3, 3, 4, 0, 1, 1, 1, 3, 2]])
>>> b = values.copy()
>>> b[ b<3 ] = -1
>>> b[ b==3 ] = 0
>>> b[ b>3 ] = 1
>>> b
array([[ 1,  1, -1, -1, -1,  1, -1, -1, -1,  1],
       [-1,  1, -1, -1,  0, -1,  0,  1,  0, -1],
       [ 0,  0,  1,  1, -1, -1,  1, -1, -1, -1],
       [-1,  1, -1, -1,  1, -1,  1, -1, -1, -1],
       [-1,  1, -1, -1, -1,  0,  0,  1, -1, -1],
       [-1,  0,  0,  1, -1, -1, -1, -1,  0, -1]])

顺便说一句，这个线程声称，直接在numpy中创建组合将产生比itertools快5倍的性能，尽管可能会牺牲一些可读性。

相关内容

最新更新

热门标签：