# will running a mean on the pdist of a matrix give me the avg of all distances of vectors, or just the elements of the vectors.

4 views (last 30 days)
Alexander Eaton on 25 Mar 2018
Commented: Walter Roberson on 26 Mar 2018
Im trying to run a Euclidean distance between each row of vector values held in a 208x4096 matrix, eg fr every row of values,I want to calc distance to every other row in matrix, approx 43,000 Euclidean calculations, I then wish to get the mean of all those Euclidean Distances calculated. The code for the matrix distance calcs below, seems to be doing this but im unsure if it is giving me exactly what I am looking for as its such a high number of calculations, I cant think of how to check this, apart from asking a knowledgable Matlab user if the code looks right. Thank you in advance for any insight you can share
for i=1:length(allData)
dist(i,:)=pdist(allData(:,i),'euclidean')';
end
dist
mean(pdist(allData))

Walter Roberson on 26 Mar 2018
There is no point in you calculating dist(i,:) and then ignoring it. The mean(pdist(allData)) should be sufficient.
Alexander Eaton on 26 Mar 2018
Sorry for my lack of knowledge here, the way I looked at it was that dist(i,:) was returning the value of every row v row euclidean distance, and the mean(pdist(allData) part returned the average of those euclidean values shown. I am hoping that either of them is returning the distance of all the values in each row and not one element of each row or even distances between every element for all the rows. This is where the confusion lies with me, it would be ideal if all i needed was the mean(pdist(allData) only. thank you very much Walter for your help.
Walter Roberson on 26 Mar 2018
The first version will not work. You are extracting one column at a time and finding the distance between the entries in the column, which is just abs() of the difference between the two scalars. It would calculate each column in isolation. Which is not what you want: you want the euclidian distance between two rows.
You cannot just change the index around to select one row at a time. pdist would calculate the distance between the one row and itself, which would be 0.
pdist calculates every row against every other row. It returns a vector, which is a compact representation of the lower triangle (distance is symmetrical so you do not need to calculate A to B and B to A both). You can mean() the results to get a single overall mean.
Or instead of taking a single overall mean you can use squareform() to convert the compact triangle of distances into a full symmetric square of distances. That would permit you to calculate the average distance from individual rows to the other rows. I showed the code in the earlier posting including the bias correction that should be used.

### Categories

Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!