How do I create a term - frequency matrix that runs fast
3 views (last 30 days)
Show older comments
Hello everyone! I am trying to create a term frequency matrix for a TF-IDF program. I have the code written but it runs extremely slow. My code works by finding the unique words in all of the documents, say for example
A = {'dog','cat','mouse'}
and for example two documents,
D = {'dog','cat','cat'; 'cat','mouse','mouse'}.
The code I have is:
for k = 1:n
for j = 1:m
seq_sum(k,j) = sum(ismember(D{k,:},A{j}));
end
end
The output of the above example would be a matrix that looks like
seq_sum = [1 2 0 ; 0 1 2];
where k is the row size of the cell array D and j is the column size of A. I also have this written in parallel but I don't want to have to rely on parallel computing. Any help would be greatly appreciated! Oh and I guess my question is how can I improve this to run faster?
2 Comments
Muthu Annamalai
on 5 Sep 2013
In MATLAB 13b, the new datatype 'categorical' is designed to solve this problem.
Accepted Answer
Azzi Abdelmalek
on 5 Sep 2013
Edited: Azzi Abdelmalek
on 5 Sep 2013
A = {'dog','cat','mouse'};
D = {'dog','cat','cat'; 'cat','mouse','mouse'};
out=zeros(size(D));
for k=1:numel(A)
idx=ismember(D,A(k));
out(:,k)=sum(idx,2);
end
disp(out)
%or
out=cell2mat(arrayfun(@(x) sum(ismember(D,A(x)),2),1:numel(A),'un',0))
3 Comments
More Answers (0)
See Also
Categories
Find more on Logical in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!