我有一个包含40,000
行和400
行的单元格字符串矩阵。我需要在first
矩阵中找到适合second
的那些行(行)。请注意,可能会有很多重复。
看起来:类似的40,000
线路
Anna Frank
Anna George
Jane Peter
Anna George
Jane Peter
etc.
这里我需要找到适合的
Anna George
Jane Peter
到目前为止,我发现的唯一方法是两个for
函数和介于两者之间的一个if
。但它相当慢:
for i=2:size(bigTable,1)
for j = 1: size(smallTable,1)
if sum(ismember(bigTable(i,1:2),smallTable(j,1:2))) == 2
Total_R(size(Total_R,1)+1,1)= i;
end
end
end
我假设您的输入是这样设置的-
bigTable =
'Anna' 'Frank'
'Anna' 'George'
'Jane' 'Peter'
'Anna' 'George'
'Jane' 'Peter'
smallTable =
'Anna' 'George'
'Jane' 'Peter'
为了解决你的问题,这里可以提出两种方法。
方法#1
基于ismember
的方法
Total_R = find(sum(ismember(bigTable,smallTable,'rows'),2)==2)
方法#2
%// Assign unique labels to each cell for both small and big cell arrays, so that
%// later on you would be dealing with numeric arrays only and
%// do not have to mess with cell arrays that were slowing you down
[unqbig,matches1,idx] = unique([bigTable(:) ; smallTable(:)])
big_labels = reshape(idx(1:numel(bigTable)),size(bigTable))
small_labels = reshape(idx(numel(bigTable)+1:end),size(smallTable))
%// Detect which rows from small_labels exactly match with those from big_labels
Total_R = find(ismember(big_labels,small_labels,'rows'))
或者将最后一行的ismember
替换为基于bsxfun
的实现-
Total_R = find(any(all(bsxfun(@eq,big_labels,permute(small_labels,[3 2 1])),2),3))
假设输入情况下这些方法的输出-
Total_R =
2
3
4
5