How do I create a term - frequency matrix that runs fast

5 views (last 30 days)
Hello everyone! I am trying to create a term frequency matrix for a TF-IDF program. I have the code written but it runs extremely slow. My code works by finding the unique words in all of the documents, say for example
A = {'dog','cat','mouse'}
and for example two documents,
D = {'dog','cat','cat'; 'cat','mouse','mouse'}.
The code I have is:
for k = 1:n
for j = 1:m
seq_sum(k,j) = sum(ismember(D{k,:},A{j}));
end
end
The output of the above example would be a matrix that looks like
seq_sum = [1 2 0 ; 0 1 2];
where k is the row size of the cell array D and j is the column size of A. I also have this written in parallel but I don't want to have to rely on parallel computing. Any help would be greatly appreciated! Oh and I guess my question is how can I improve this to run faster?
  2 Comments
Ryan
Ryan on 5 Sep 2013
Thank you Muthu I am however working with 12b :(. Im sure I can get a copy of 13b though. Thank you for your comment.

Sign in to comment.

Accepted Answer

Azzi Abdelmalek
Azzi Abdelmalek on 5 Sep 2013
Edited: Azzi Abdelmalek on 5 Sep 2013
A = {'dog','cat','mouse'};
D = {'dog','cat','cat'; 'cat','mouse','mouse'};
out=zeros(size(D));
for k=1:numel(A)
idx=ismember(D,A(k));
out(:,k)=sum(idx,2);
end
disp(out)
%or
out=cell2mat(arrayfun(@(x) sum(ismember(D,A(x)),2),1:numel(A),'un',0))
  3 Comments
Ryan
Ryan on 5 Sep 2013
Azzi I have one more question. I forgot in my code that D may not have the same column size so it could look like:
D = {1x3 ; 1x5}
Is there still a way to use the arrayfun as a double for loop? Or possible make it so D has the same column dimension so I can apply what you suggested?

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!