# How do I create a term - frequency matrix that runs fast

5 views (last 30 days)
Ryan on 5 Sep 2013
Hello everyone! I am trying to create a term frequency matrix for a TF-IDF program. I have the code written but it runs extremely slow. My code works by finding the unique words in all of the documents, say for example
A = {'dog','cat','mouse'}
and for example two documents,
D = {'dog','cat','cat'; 'cat','mouse','mouse'}.
The code I have is:
for k = 1:n
for j = 1:m
seq_sum(k,j) = sum(ismember(D{k,:},A{j}));
end
end
The output of the above example would be a matrix that looks like
seq_sum = [1 2 0 ; 0 1 2];
where k is the row size of the cell array D and j is the column size of A. I also have this written in parallel but I don't want to have to rely on parallel computing. Any help would be greatly appreciated! Oh and I guess my question is how can I improve this to run faster?
##### 2 CommentsShowHide 1 older comment
Ryan on 5 Sep 2013
Thank you Muthu I am however working with 12b :(. Im sure I can get a copy of 13b though. Thank you for your comment.

Azzi Abdelmalek on 5 Sep 2013
Edited: Azzi Abdelmalek on 5 Sep 2013
A = {'dog','cat','mouse'};
D = {'dog','cat','cat'; 'cat','mouse','mouse'};
out=zeros(size(D));
for k=1:numel(A)
idx=ismember(D,A(k));
out(:,k)=sum(idx,2);
end
disp(out)
%or
out=cell2mat(arrayfun(@(x) sum(ismember(D,A(x)),2),1:numel(A),'un',0))
Ryan on 5 Sep 2013
Azzi I have one more question. I forgot in my code that D may not have the same column size so it could look like:
D = {1x3 ; 1x5}
Is there still a way to use the arrayfun as a double for loop? Or possible make it so D has the same column dimension so I can apply what you suggested?