Alternative Matrix Multiplication which does euclidean distance instead of dot product
6 views (last 30 days)
Of course, a great plus of matlab is to be able to do a matrix multiplication without writing 3 loops, and with lots of syntax sugar (for example, imagine having to write C = A.mult(B) each time). Moreover, it will be fast because the interpreter just reads 3 symbols (namely: A*B) and turn to assembly or whatever low level code matlab uses there.
However, what I often have is that I have a set of data points of multiple dimensions, for example 3, and another set of data points, again with for example 3 dimensions. Then I want to compare each data point of one set to another using a distance measure. If my distance measure is a dot product, then of course a matrix product would suffice:
A : m x 3 matrix
B : 3 x n matrix
C=A*B, then C(i,j) is the distance between data point i in set A and data point j in set B.
But what if my distance measure is euclidean? Then I would like to be able to do the same. However, whenever the * operation does a multiplication, it should subtract and square, and whenever the * operation does a sum, I also want it to do a sum. C=A*B would create a simular matrix as above, yet using euclidean distance.
Of course, I can simulate this writing a function, maybe one forloop and some matrix magic. However, I want (1) syntax sugar and (2) fast c/assembly implementation. Moreover, I think such a use case is actually quite realistic. I often want to find for each x out of M data points to which of some other reference database of N data points, x lies closest (using euclidean distance). For example in k-means when you try to decide to wich centroid each data point belongs.
So, does this exist? Would this be a nice feature? Or am I the only one who sees this as a 'nice to have'? Should I write a c-file for this and mex it?
Matt J on 7 Jun 2013
Edited: Matt J on 7 Jun 2013
You can get something like what you're after using this
and using my INTERDISTS utiltity below. For example
>> A=rand(2,3), B=rand(3,2),
0.9448 0.4893 0.9001
0.4909 0.3377 0.3692
>> A=DataObj(A,'Ops',Ops); B=DataObj(B,'Ops',Ops);
>> Euclidean = A*B
My overall experience though is that syntactic sugar is rarely ever worth it. Better just to use the interdists function directly.
%Finds the graph of distances between point coordinates
% (1) Graph=interdists(A,B)
% A: matrix whose columns are coordinates of points, for example
% [[x1;y1;z1], [x2;y2;z2] ,..., [xM;yM;zM]]
% but the columns may be points in a space of any dimension, not just 3D.
% B: A second matrix whose columns are coordinates of points in the same
% Euclidean space. Default B=A.
% Graph: The MxN matrix of separation distances in l2 norm between the coordinates.
% Namely, Graph(i,j) will be the distance between A(:,i) and B(:,j).
% (2) interdists(A,'noself') is the same as interdists(A), except the output
% diagonals will be NaN instead of zero. Hence, for example, operations
% like min(interdists(A,'noself')) will ignore self-distances.
% See also getgraph
Graph=l2norm(bsxfun(@minus, A, B),1);