10 views (last 30 days)

Show older comments

Of course, a great plus of matlab is to be able to do a matrix multiplication without writing 3 loops, and with lots of syntax sugar (for example, imagine having to write C = A.mult(B) each time). Moreover, it will be fast because the interpreter just reads 3 symbols (namely: A*B) and turn to assembly or whatever low level code matlab uses there.

However, what I often have is that I have a set of data points of multiple dimensions, for example 3, and another set of data points, again with for example 3 dimensions. Then I want to compare each data point of one set to another using a distance measure. If my distance measure is a dot product, then of course a matrix product would suffice:

A : m x 3 matrix

B : 3 x n matrix

C=A*B, then C(i,j) is the distance between data point i in set A and data point j in set B.

But what if my distance measure is euclidean? Then I would like to be able to do the same. However, whenever the * operation does a multiplication, it should subtract and square, and whenever the * operation does a sum, I also want it to do a sum. C=A*B would create a simular matrix as above, yet using euclidean distance.

Of course, I can simulate this writing a function, maybe one forloop and some matrix magic. However, I want (1) syntax sugar and (2) fast c/assembly implementation. Moreover, I think such a use case is actually quite realistic. I often want to find for each x out of M data points to which of some other reference database of N data points, x lies closest (using euclidean distance). For example in k-means when you try to decide to wich centroid each data point belongs.

So, does this exist? Would this be a nice feature? Or am I the only one who sees this as a 'nice to have'? Should I write a c-file for this and mex it?

Matt J
on 7 Jun 2013

Edited: Matt J
on 7 Jun 2013

You can get something like what you're after using this

and using my INTERDISTS utiltity below. For example

>> A=rand(2,3), B=rand(3,2),

A =

0.9448 0.4893 0.9001

0.4909 0.3377 0.3692

B =

0.1112 0.2417

0.7803 0.4039

0.3897 0.0965

>> Ops.mtimes=@(a,b)interdists(a.',b);

>> A=DataObj(A,'Ops',Ops); B=DataObj(B,'Ops',Ops);

>> Euclidean = A*B

Euclidean =

1.0198 1.0712

0.5834 0.3753

My overall experience though is that syntactic sugar is rarely ever worth it. Better just to use the interdists function directly.

function Graph=interdists(A,B)

%Finds the graph of distances between point coordinates

%

% (1) Graph=interdists(A,B)

%

% in:

%

% A: matrix whose columns are coordinates of points, for example

% [[x1;y1;z1], [x2;y2;z2] ,..., [xM;yM;zM]]

% but the columns may be points in a space of any dimension, not just 3D.

%

% B: A second matrix whose columns are coordinates of points in the same

% Euclidean space. Default B=A.

%

%

% out:

%

% Graph: The MxN matrix of separation distances in l2 norm between the coordinates.

% Namely, Graph(i,j) will be the distance between A(:,i) and B(:,j).

%

%

% (2) interdists(A,'noself') is the same as interdists(A), except the output

% diagonals will be NaN instead of zero. Hence, for example, operations

% like min(interdists(A,'noself')) will ignore self-distances.

%

% See also getgraph

noself=false;

if nargin<2

B=A;

elseif ischar(B)&&strcmpi(B,'noself')

noself=true;

B=A;

end

N=size(A,1);

B=reshape(B,N,1,[]);

Graph=l2norm(bsxfun(@minus, A, B),1);

Graph=squeeze(Graph);

if noself

n=length(Graph);

Graph(linspace(1,n^2,n))=nan;

end

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!