Calculating the Most Similar Pair of Vectors using Cosine distance in a matrix

7 views (last 30 days)
Hello,
I have a 943x1682 matrix in which I want to calculate the two most similar vectors in this matrix. So i want see the co-sine distance of each vector in the matrix to each vector in the matrix, of course not including the vector with itself, if oen cannot do that I can just ignore those.
for i=1:n
for j=1:n
cosSimalll(i,j)=dot(A(:,i),A(:,j))/(norm(A(:,i)*norm(A(:,j))));
end
end
I made this loop to try to calcualte this, so I can get a 1682x1682 matrix, with each cell corresponding to the similarity between i and j. However when I run this, it takes forever to run, and when I try to ope nthe resulting matrix in my workspace, it says Cannot display summaries of variables with more than 524288 elements.
Is there an easier way to do this or am i doing something wrong? Please get back to me asap. Thank you!

Answers (1)

James Tursa
James Tursa on 14 Feb 2021
Edited: James Tursa on 14 Feb 2021
Use a standard matrix multiply to get the dot products. MATLAB is very fast at standard matrix multiplies. And then normalize the result. E.g.,
AA = A' * A; % the column dot products via a standard matrix multiply
Anorm = sqrt(diag(AA)); % the norms of the columns
Adist = AA ./ (Anorm .* Anorm.'); % normalize the column dot products into cosine distances
Then pick off the maximum value for your answer, disregarding the diagonal. E.g.,
n = size(A,2); % the number of columns
Adist(1:n+1:end) = -inf; % disregard the diagonal (column compared to itself)
[~,x] = max(Adist(:)); % find the max cosine distance linear index
[col1,col2] = ind2sub(size(Adist),x); % convert linear index into the original columns
Then col1 and col2 are the column numbers of the most similiar columns using cosine distance as a measure.

Categories

Find more on Creating and Concatenating Matrices in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!