dtype=int的2D Numpy数组:最大化相同整数之间的距离



对双重过账表示歉意。我还没有找到解决办法。

前一篇:条件麻木洗牌

问题我有一个随机打乱的dtype=int二维数组,希望最大化相同整数之间的距离。这是一个优化问题。

例如,我有:

np.array([1, 2, 2, 2],
[3, 2, 1, 1],
[1, 3, 3, 1],
[1, 5, 5, 1])

一些整数在0轴和1轴方向上过于接近。例如两个5。有没有一种启发式或scipy优化方法可以在一定程度上解决这个问题?

提前感谢您,并向致以良好的问候

我利用scipy.ndimage.distance_transform_edt和随机采样发明了自己的小方法。它远不是最佳的,但它做得很好,这取决于你的目标:

import numpy as np
from scipy.ndimage import distance_transform_edt
def maximize_arr_replicate_distances(input_arr: np.array) -> np.array:

# Assert the input array
assert isinstance(input_arr, np.ndarray), "'input_arr' is not a Numpy array."
assert input_arr.ndim == 2, f"'input_arr' is not an 'm x n' matrix."
assert np.issubdtype(input_arr.dtype, np.integer), "'input_arr' is not of dtype class 'integer'."
# Bin: Bin all unique elements into subarrays
unique_elements = np.unique(input_arr)
binned_uniques = [input_arr[input_arr == i].tolist() for i in unique_elements]
# Determine a sampling order: randomly sample from subarrays without replacement and append to new list
sampling_order = []
while(any(binned_uniques)):
random_index = np.random.randint(unique_elements.size)
if binned_uniques[random_index]:
sampling_order.append(binned_uniques[random_index].pop())
# Distribute elements: Append elements to empty array based on sampling order and maximum possible distance between replicates
output_arr = np.full(input_arr.shape, np.nan)
for i in sampling_order.copy():
if np.any(output_arr == i):
distance_transform = distance_transform_edt(output_arr != i)
mask = np.in1d(output_arr, unique_elements[unique_elements != i]).reshape(input_arr.shape)
distance_transform_masked = np.ma.masked_array(distance_transform, mask=mask)
distance_transform_max = np.argwhere(distance_transform_masked == np.amax(distance_transform_masked))
rand_max_y, rand_max_x = distance_transform_max[np.random.randint(len(distance_transform_max))]
output_arr[rand_max_y, rand_max_x] = i
else:
nan_indices = np.argwhere(np.isnan(output_arr))
rand_nan_y, rand_nan_x = nan_indices[np.random.randint(len(nan_indices))]
output_arr[rand_nan_y, rand_nan_x] = i
return output_arr.astype(int)
input_arr = np.array([[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
print(maximize_arr_replicate_distances(input_arr))
# [[7 0 0 0 1 0 0 0 1 0 0 7]
#  [0 0 0 0 0 0 0 0 0 0 0 1]
#  [1 0 0 0 0 0 7 0 0 0 0 0]
#  [0 0 0 0 0 0 0 1 0 0 0 0]
#  [0 0 0 7 0 0 0 0 0 0 1 0]
#  [7 0 0 0 0 0 0 0 7 0 0 0]
#  [0 0 0 0 0 0 0 0 0 0 0 0]
#  [0 0 1 0 7 0 0 0 0 1 0 7]]

它适用于各种输入。

请注意,该算法偏向于数组的边界。例如,如果重复10000次,1将具有以下频率分布[%]:

from typing import Callable
def evaluate(input_arr: np.array, function: Callable[[np.array], np.array], sample_index: int, iterations: int = 10_000) -> np.array:
sum_arr = np.zeros((input_arr.shape))
for i in range(iterations):
sum_arr += np.where(function(input_arr) == sample_index, 1, 0)
hit_frequencies = sum_arr / sum_arr.sum() * 100  # [%]
return np.around(hit_frequencies, 2) 
maximize_replicate_distances = evaluate(input_arr, maximize_arr_replicate_distances, 1)
print(maximize_replicate_distances)
# Frequency distribution of sample replicate '1' after distance-maximizing shuffling (n = 10,000) [%]:
# [[4.06 2.1  0.93 1.25 1.34 2.02 2.1  1.38 1.21 0.93 2.15 4.04]
#  [2.48 0.26 0.15 0.3  0.41 0.85 0.82 0.42 0.28 0.13 0.27 2.42]
#  [1.07 0.14 0.13 0.46 0.5  0.58 0.63 0.55 0.44 0.12 0.14 1.08]
#  [1.55 0.4  0.9  1.49 0.82 0.7  0.77 0.8  1.48 0.84 0.44 1.61]
#  [1.67 0.44 0.92 1.48 0.81 0.74 0.76 0.81 1.42 0.88 0.43 1.72]
#  [1.04 0.14 0.12 0.4  0.55 0.59 0.6  0.52 0.47 0.14 0.16 1.09]
#  [2.44 0.24 0.15 0.26 0.42 0.88 0.81 0.41 0.3  0.15 0.26 2.33]
#  [3.99 2.11 0.97 1.33 1.39 1.98 2.01 1.42 1.3  0.97 2.12 3.96]]

相关内容

  • 没有找到相关文章

最新更新