颜色量化的K-均值-未矢量化的代码



我正在做Andrew NG的这个练习,关于使用k-均值来减少图像中的颜色数量。它工作正常,但由于代码中有所有的for循环,恐怕有点慢,所以我想对它们进行矢量化。但有些循环我似乎无法有效地向量化。请帮帮我,非常感谢!

如果可能的话,请对我的编码风格给出一些反馈:)

这是练习的链接,这是数据集。正确的结果在练习的链接中给出。

这是我的代码:

function [] = KMeans()
Image = double(imread('bird_small.tiff'));
[rows,cols, RGB] = size(Image);
Points = reshape(Image,rows * cols, RGB);
K = 16;
Centroids = zeros(K,RGB);    
s = RandStream('mt19937ar','Seed',0);
% Initialization :
% Pick out K random colours and make sure they are all different
% from each other! This prevents the situation where two of the means
% are assigned to the exact same colour, therefore we don't have to 
% worry about division by zero in the E-step 
% However, if K = 16 for example, and there are only 15 colours in the
% image, then this while loop will never exit!!! This needs to be
% addressed in the future :( 
% TODO : Vectorize this part!
done = false;
while done == false
RowIndex = randperm(s,rows);
ColIndex = randperm(s,cols);
RowIndex = RowIndex(1:K);
ColIndex = ColIndex(1:K);
for i = 1 : K
for j = 1 : RGB
Centroids(i,j) = Image(RowIndex(i),ColIndex(i),j);
end
end
Centroids = sort(Centroids,2);
Centroids = unique(Centroids,'rows'); 
if size(Centroids,1) == K
done = true;
end
end;
%     imshow(imread('bird_small.tiff'))
%    
%     for i = 1 : K
%         hold on;
%         plot(RowIndex(i),ColIndex(i),'r+','MarkerSize',50)
%     end

eps = 0.01; % Epsilon
IterNum = 0;
while 1
% E-step: Estimate membership given parameters 
% Membership: The centroid that each colour is assigned to
% Parameters: Location of centroids
Dist = pdist2(Points,Centroids,'euclidean');
[~, WhichCentroid] = min(Dist,[],2);
% M-step: Estimate parameters given membership
% Membership: The centroid that each colour is assigned to
% Parameters: Location of centroids
% TODO: Vectorize this part!
OldCentroids = Centroids;
for i = 1 : K
PointsInCentroid = Points((find(WhichCentroid == i))',:);
NumOfPoints = size(PointsInCentroid,1);
% Note that NumOfPoints is never equal to 0, as a result of
% the initialization. Or .... ???????
if NumOfPoints ~= 0 
Centroids(i,:) = sum(PointsInCentroid , 1) / NumOfPoints ;
end
end    
% Check for convergence: Here we use the L2 distance
IterNum = IterNum + 1;
Margins = sqrt(sum((Centroids - OldCentroids).^2, 2));
if sum(Margins > eps) == 0
break;
end
end
IterNum;
Centroids ;

% Load the larger image
[LargerImage,ColorMap] = imread('bird_large.tiff');
LargerImage = double(LargerImage);
[largeRows,largeCols,NewRGB] = size(LargerImage);  % RGB is always 3     
% TODO: Vectorize this part!    
largeRows
largeCols
NewRGB
% Replace each of the pixel with the nearest centroid    
NewPoints = reshape(LargerImage,largeRows * largeCols, NewRGB);
Dist = pdist2(NewPoints,Centroids,'euclidean');
[~,WhichCentroid] = min(Dist,[],2);
NewPoints = Centroids(WhichCentroid,:);
LargerImage = reshape(NewPoints,largeRows,largeCols,NewRGB);
%     for i = 1 : largeRows 
%         for j = 1 : largeCols
%             Dist = pdist2(Centroids,reshape(LargerImage(i,j,:),1,RGB),'euclidean');
%             [~,WhichCentroid] = min(Dist);    
%             LargerImage(i,j,:) = Centroids(WhichCentroid,:);            
%         end
%     end
% Display new image
imshow(uint8(round(LargerImage)),ColorMap)

更新:已替换

for i = 1 : K
for j = 1 : RGB
Centroids(i,j) = Image(RowIndex(i),ColIndex(i),j);
end
end

带有

for i = 1 : K
Centroids(i,:) = Image(RowIndex(i),ColIndex(i),:);
end

我认为这可以通过使用线性索引来进一步向量化,但现在我应该只关注while循环,因为它需要大部分时间。当我尝试@Dev iL的建议并替换时

for i = 1 : K
PointsInCentroid = Points((find(WhichCentroid == i))',:);
NumOfPoints = size(PointsInCentroid,1);
% Note that NumOfPoints is never equal to 0, as a result of
% the initialization. Or .... ???????
if NumOfPoints ~= 0 
Centroids(i,:) = sum(PointsInCentroid , 1) / NumOfPoints ;
end
end    

带有

E = sparse(1:size(WhichCentroid), WhichCentroid' , 1, Num, K, Num);
Centroids = (E * spdiags(1./sum(E,1)',0,K,K))' * Points ;

结果总是更差:K=16时,第一次耗时2414s,第二次耗时2455s;K=32,第一个需要4529秒,第二个需要5022秒。矢量化似乎没有帮助,但也许我的代码有问题:(.

更换

for i = 1 : K
for j = 1 : RGB
Centroids(i,j) = Image(RowIndex(i),ColIndex(i),j);
end
end

带有

for i = 1 : K
Centroids(i,:) = Image(RowIndex(i),ColIndex(i),:);
end

我认为这可以通过使用线性索引来进一步向量化,但现在我应该只关注while循环,因为它需要大部分时间。当我尝试@Dev iL的建议并替换时

for i = 1 : K
PointsInCentroid = Points((find(WhichCentroid == i))',:);
NumOfPoints = size(PointsInCentroid,1);
% Note that NumOfPoints is never equal to 0, as a result of
% the initialization. Or .... ???????
if NumOfPoints ~= 0 
Centroids(i,:) = sum(PointsInCentroid , 1) / NumOfPoints ;
end
end    

带有

E = sparse(1:size(WhichCentroid), WhichCentroid' , 1, Num, K, Num);
Centroids = (E * spdiags(1./sum(E,1)',0,K,K))' * Points ;

结果总是更差:K=16时,第一次耗时2414s,第二次耗时2455s;K=32,第一次耗时4529秒,第二次耗时5022秒。在这种情况下,矢量化似乎没有帮助。

然而,当我更换时

Dist = pdist2(Points,Centroids,'euclidean');
[~, WhichCentroid] = min(Dist,[],2);

(在while循环中)使用

Dist = bsxfun(@minus,dot(Centroids',Centroids',1)' / 2 , Centroids * Points'  );
[~, WhichCentroid] = min(Dist,[],1);
WhichCentroid = WhichCentroid';

代码运行得更快,尤其是当K很大(K=32)时

谢谢大家!

最新更新