我正试图编写一个函数来搅乱一个数组,该数组包含重复元素,但确保重复元素彼此之间不会太近。
这个代码有效,但在我看来效率低下:
function shuffledArr = distShuffle(myArr, myDist)
% this function takes an array myArr and shuffles it, while ensuring that repeating
% elements are at least myDist elements away from on another
% flag to indicate whether there are repetitions within myDist
reps = 1;
while reps
% set to 0 to break while-loop, will be set to 1 if it doesn't meet condition
reps = 0;
% randomly shuffle array
shuffledArr = Shuffle(myArr);
% loop through each unique value, find its position, and calculate the distance to the next occurence
for x = 1:length(unique(myArr))
% check if there are any repetitions that are separated by myDist or less
if any(diff(find(shuffledArr == x)) <= myDist)
reps = 1;
break;
end
end
end
这对我来说似乎不太理想,原因有三:
1( 在找到解决方案之前,可能没有必要重复洗牌。
2( 如果没有可能的解决方案(即,将myDist设置得太高,无法找到合适的配置(,则while循环将永远持续下去。关于如何提前捕捉到这个,有什么想法吗?
3( 必须有一种比我通过循环每个唯一值来确定数组中重复元素之间的距离更简单的方法。
我将感谢对第2点和第3点的回答,即使第1点是正确的,并且可以在一次洗牌中做到这一点。
我认为检查以下条件就足以防止无限循环:
[~,num, C] = mode(myArr);
N = numel(C);
assert( (myDist<=N) || (myDist-N+1) * (num-1) +N*num <= numel(myArr),...
'Shuffling impossible!');
假设myDist
是2
,我们有以下数据:
[4 6 5 1 6 7 4 6]
我们可以找到模式6
和它的出现3
。我们排列6
,用2 = myDist
空白将它们隔开:
6 _ _ 6 _ _6
必须有(3-1) * myDist = 4
数字才能填空。现在我们又有五个数字,这样数组就可以被打乱了。
如果我们有多种模式,问题就会变得更加复杂。例如,对于这个阵列[4 6 5 1 6 7 4 6 4]
,我们有N=2
模式:6
和4
。它们可以排列为:
6 4 _ 6 4 _ 6 4
我们有两个空格和另外三个数字[ 5 1 7]
,可以用来填空。例如,如果我们只有一个数字[ 5]
,就不可能填补空白,也不能打乱数组。
对于第三点,你可以使用稀疏矩阵来加速计算(我在Octave中的初步测试表明它更有效(:
function shuffledArr = distShuffleSparse(myArr, myDist)
[U,~,idx] = unique(myArr);
reps = true;
while reps
S = Shuffle(idx);
shuffledBin = sparse ( 1:numel(idx), S, true, numel(idx) + myDist, numel(U) );
reps = any (diff(find(shuffledBin)) <= myDist);
end
shuffledArr = U(S);
end
或者,您可以使用sub2in并排序,而不是稀疏矩阵:
function shuffledArr = distShuffleSparse(myArr, myDist)
[U,~,idx] = unique(myArr);
reps = true;
while reps
S = Shuffle(idx);
f = sub2ind ( [numel(idx) + myDist, numel(U)] , 1:numel(idx), S );
reps = any (diff(sort(f)) <= myDist);
end
shuffledArr = U(S);
end
如果你只想找到一个可能的解决方案,你可以使用类似的东西:
x = [1 1 1 2 2 2 3 3 3 3 3 4 5 5 6 7 8 9];
n = numel(x);
dist = 3; %minimal distance
uni = unique(x); %get the unique value
his = histc(x,uni); %count the occurence of each element
s = [sortrows([uni;his].',2,'descend'), zeros(length(uni),1)];
xr = []; %the vector that will contains the solution
%the for loop that will maximize the distance of each element
for ii = 1:n
s(s(:,3)<0,3) = s(s(:,3)<0,3)+1;
s(1,3) = s(1,3)-dist;
s(1,2) = s(1,2)-1;
xr = [xr s(1,1)];
s = sortrows(s,[3,2],{'descend','descend'})
end
if any(s(:,2)~=0)
fprintf('failed, dist is too big')
end
结果:
xr = [3 1 2 5 3 1 2 4 3 6 7 8 3 9 5 1 2 3]
解释:
我创建了一个向量s
,在开始时s
等于:
s =
3 5 0
1 3 0
2 3 0
5 2 0
4 1 0
6 1 0
7 1 0
8 1 0
9 1 0
%col1 = unique element; col2 = occurence of each element, col3 = penalities
在for循环的每次迭代中,我们都会选择出现次数最多的元素,因为这个元素将更难放置在我们的数组中。
然后在第一次迭代后s等于:
s =
1 3 0 %1 is the next element that will be placed in our array.
2 3 0
5 2 0
4 1 0
6 1 0
7 1 0
8 1 0
9 1 0
3 4 -3 %3 has now 5-1 = 4 occurence and a penalities of -3 so it won't show up the next 3 iterations.
最后,第二列的每个数字都应该等于0,如果不是的话,最小距离太大了。