**You are now following this question**

- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.

# similarity matrix has very large size, how process it without segmenting it?

2 views (last 30 days)

Show older comments

*hi,

I have similarity matrix with size 17770*17770. When process it get out of memory

In fact, at first , I got this similarity matrix by segmenting the original matrix into 7 parts , each one with size 2500*17770, then collect these parts to get the final size. But, the next step , I can not process it partly because I want to make clustering for this similarity matrix. So, it is impossible processing it partly

Is there a way to process this similarity matrix.*

Thanks in advance

##### 23 Comments

huda nawaf
on 27 Apr 2013

to Matt J,

17770*17770 is co_occurance matrix. I got it after segment it into 7 parts.

I have 17770 files , ecah file is movie has been seen by number of users. I were wanting to compute co_occurance of all movies, so compute this matrix partly, but when get it finally I would like to cluster the movies depending on co_occurance matrix.

thanks

Cedric Wannaz
on 28 Apr 2013

Walter Roberson
on 28 Apr 2013

huda nawaf
on 28 Apr 2013

To Anand,

I try to cluster the co_occurance matrix using ward method(following code), but at the same time I used another menthod spectral clustering for Newman, I got out of memory too.

d=dlmread('d:\matlab\r2011a\bin\co_occurance_mov.txt');

p=pdist(d,'euclidean');

No_cluster=4;

L=linkage(p,'ward');

cluster1=struct('user',0,'length',0);

T{1} = cluster(L,'maxclust',No_cluster);

for k=1:No_cluster

z=find(T{1}==k);

for j=1:length(z)

cluster1(k).user(j)=z(j);

end

cluster1(k).length=length(z);

end

f0=fopen('d:\matlab\r2011a\bin\new_movielens\100k_mov\divid_seq\cluster2_ward.txt','w');

for i=1:No_cluster

for j=1:cluster1(i,1).length

fprintf(f0,'%d ',cluster1(i,1).user(j));

end

fprintf(f0,'\n');

end

fclose all;

thanks

huda nawaf
on 28 Apr 2013

to Cedric, when used NNZ for 17770%17770 , I got

undifined function or method 'NNZ' for input arguments of type'double'.

Walter, the max count is 232940, this is diagonal element , i.e it is occurance the item with itself.

thanks

Cedric Wannaz
on 28 Apr 2013

huda nawaf
on 28 Apr 2013

Walter, I have to normalize the matrix befor clustering, so the max account will be 1

Cedric Wannaz
on 28 Apr 2013

huda nawaf
on 28 Apr 2013

My system is 64bit, and 6 GB of RAM.

In this case , must I use blocks or what Walter suggested. if so, please give me an idea how use blocks or how work with a lower precision?

thanks in advance

Walter Roberson
on 28 Apr 2013

When you are initializing the integer co-occurrence matrix, instead of initializing it as zeros(17770,17770), initialize it as zeros(17770,17770,'int32').

Then when you want to normalize it, use

co_occurance_mat = single(co_occurance_mat) ./ single(max(co_occurance_mat(:)));

That might still cause you to run out of memory because of the temporary space needed to do the conversion and division. If it does, then probably the formation of the distance matrix during clustering would also run out of memory.

huda nawaf
on 28 Apr 2013

Hi Cedric, why I have to use nnz with 2500*17770? we need to know the no. of non zero in total matrix.

Right?

Anyway, I want someone tell me how deal with blocks of matrix to make clustering for total matrix?

Walter Roberson
on 28 Apr 2013

Matt J
on 28 Apr 2013

Anyway, I want someone tell me how deal with blocks of matrix to make clustering for total matrix?

That question becomes unnecessary if it turns out that the majority of your matrix elements are zeros. In that case, you don't have to break the matrix into blocks. You would use the SPARSE command to make the entire matrix fit into memory. Since you seem unaware of SPARSE and what it does, the others want to make sure you consider it before proceeding.

Walter Roberson
on 28 Apr 2013

It appears to me that you could save memory during the clustering by not using pdist yourself, and instead use

L = linkage(d, 'ward', 'euclidean', 'savememory', 'on');

huda nawaf
on 29 Apr 2013

Walter,

ward did cluster when I used : L = linkage(d, 'ward', 'euclidean', 'savememory', 'on');

But ,I can not predicate the running time ,maybe 4-5 hours. anyway, it is not important the running time becuase I run it one time.

you resolved big problem , many many thanks.

Walter, If I want use spectral clustering instead of ward to show the difference betwen them in terms of clustering. earlier I faced the same problem (out of memory) wth spectral clustering. what I have to change in following code.in the following function call to other function, but the out of memory happen befor calling the other function

sim=dlmread('d:\matlab\r2011a\bin\netflix\combain_arrays_sim\sim2_norm.txt');

[p o]=size(sim)

for i=1:p

x=sim(i,:);

x=x(x~=0);

deg(i)=length(x);

end

total_edg=sum(deg)/2

%%%%%compute the modularity matrx

B=sim-((deg'*deg)/(2*total_edg));

'%%%compute eignvalue and eignvector'

[U Beta]=eig(B);

Beta1=diag(Beta);

[Beta1 ind]=sort(Beta1,'descend');

if Beta1(1)>0

bb=find(U(:,ind(1))>0);

for i=1:length(bb)

s(bb(i))=1;

end

bb1=find(U(:,ind(1))<=0);

for j=1:length(bb1)

s(bb1(j))=-1;

end

v=s*B*s'

% if v>0

' %%%divide the eignvector into two groups'

if sum(s)~=length(s)&& sum(s)~=-length(s)

k=1;k1=1;

for j=1:length(s)

if s(j)>0

for j1=1:o

Grp_1(k,j1)=B(j,j1);

trac(k)=j;

end

k=k+1;

else

for j2=1:o

Grp_2(k1,j2)=B(j,j2);

trac1(k1)=j;

end

k1=k1+1;

end

end

tt=[trac1(1:length(trac1))];

Grp_1(:,tt)=[];

tt1=[trac(1:length(trac))];

Grp_2(:,tt1)=[];

hh=sum(Grp_1');

[p o]=size(Grp_1');

for i=1:p

for j=1:o

if i==j

B_updat(i,j)=Grp_1(i,j)-hh(i);

else

B_updat(i,j)=Grp_1(i,j);

end

end

end

hh1=sum(Grp_2');

[p o]=size(Grp_2);

for i=1:p

for j=1:o

if i==j

B1_updat(i,j)=Grp_2(i,j)-hh1(i);

else

B1_updat(i,j)=Grp_2(i,j);

end

end

end

itr=1

nn=(s*B*s')/2;

z=0;

Divide1_2(B_updat,z,trac,itr,nn);

Divide1_2(B1_updat,z,trac1,itr,nn);

else

'the network is indivisible because s is indivisible'

return;

end

else

'the network is indivisible because Beta1<0'

end

fclose all

Aishwarya Iyengar
on 9 Jul 2020

@huda nawaf

I have a question :

How to create a similarity matrix for 300x300 images ?

please Help... Thanks in advance.

### Answers (0)

### See Also

### Tags

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!**An Error Occurred**

Unable to complete the action because of changes made to the page. Reload the page to see its updated state.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)

### Asia Pacific

- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)