Permute矩阵，同时保留一些项目

我有一个numpy数组（实际上是pandas数据帧，但数组可以），我想对其值进行置换。问题是，我需要保留许多非随机定位的NaN。到目前为止，我有一个迭代解决方案，包括填充索引列表，制作该列表的排列副本，然后将原始矩阵中的值从原始索引分配给排列索引。关于如何更快地做到这一点，有什么建议吗？矩阵有数百万个值，最好我想做很多排列，但迭代求解速度太慢了。

以下是迭代解决方案：

import numpy, pandas
df = pandas.DataFrame(numpy.random.randn(3,3), index=list("ABC"), columns=list("abc"))
df.loc[[0,2], "a"] = numpy.nan
indices = []
for row in df.index:
    for col in df.columns:
        if not numpy.isnan(df.loc[row, col]):
            indices.append((row, col))
permutedIndices = numpy.random.permutation(indices)
permuteddf = pandas.DataFrame(index=df.index, columns=df.columns)
for i in range(len(indices)):
    permuteddf.loc[permutedIndices[i][0], permutedIndices[i][1]] = df.loc[indices[i][0], indices[i][1]]

结果：

In [19]: df
Out[19]: 
         a         b         c
A      NaN  0.816350 -1.187731
B -0.58708 -1.054487 -1.570801
C      NaN -0.290624 -0.453697
In [20]: permuteddf
Out[20]: 
          a          b          c
A       NaN  -0.290624  0.8163501
B -1.570801 -0.4536974  -1.054487
C       NaN -0.5870797  -1.187731

怎么样：

>>> df = pd.DataFrame(np.random.randn(5,5))
>>> df[df < 0.1] = np.nan
>>> df
          0         1         2         3         4
0       NaN  1.721657  0.446694       NaN  0.747747
1  1.178905  0.931979       NaN       NaN       NaN
2  1.547098       NaN       NaN       NaN  0.225014
3       NaN       NaN       NaN  0.886416  0.922250
4  0.453913  0.653732       NaN  1.013655       NaN
[5 rows x 5 columns]
>>> movers = ~np.isnan(df.values)
>>> df.values[movers] = np.random.permutation(df.values[movers])
>>> df
          0         1         2         3         4
0       NaN  1.013655  1.547098       NaN  1.721657
1  0.886416  0.446694       NaN       NaN       NaN
2  1.178905       NaN       NaN       NaN  0.453913
3       NaN       NaN       NaN  0.747747  0.653732
4  0.922250  0.225014       NaN  0.931979       NaN
[5 rows x 5 columns]

相关内容

最新更新

热门标签：