# Creating dummy variables from categorical variable

36 views (last 30 days)
Snoopy on 23 Sep 2017
Answered: Snoopy on 24 Sep 2017
Suppose there is a column vector array n containing unique but repeating values of the form
1
1
5
7
7
The aim is to create a matrix D which contains in its columns dummy variables for each unique value in n of the form
1 0 0
1 0 0
0 1 0
0 0 1
0 0 1
I use the following code:
uniq = unique(n);
N_obs = size(n,1);
N_ind = size(uniq,1);
D = NaN(N_obs,N_ind);
D(:,1) = n == uniq(1,1);
D(:,2) = n == uniq(2,1);
D(:,3) = n == uniq(3,1);
This produces the desired D matrix. However, it is tedious to write the last three lines so I wanted to use a for loop of the form
for i = N_ind
D(:,i) = n == uniq(i,1);
end
But this gives
NaN NaN 0
NaN NaN 0
NaN NaN 0
NaN NaN 1
NaN NaN 1
Where is my mistake in the loop?

Walter Roberson on 23 Sep 2017
for i = 1 : N_ind

Stephen23 on 24 Sep 2017
Vectorized:
>> vec = [1,1,5,7,7];
>> [~,~,idc] = unique(vec);
>> siz = [numel(idc),max(idc)];
>> out = zeros(siz);
>> out(sub2ind(siz,1:siz(1),idc)) = 1
out =
1 0 0
1 0 0
0 1 0
0 0 1
0 0 1
>>
Snoopy on 24 Sep 2017
Thanks!

Guillaume on 24 Sep 2017
Another shorter vectorised option (possibly no faster than Stephen's, I haven't tested):
[~, ~, col] = unique(vec);
out = accumarray([find(col), col], 1) %only purpose of the find is to generate (1:numel(col))'
Snoopy on 24 Sep 2017
Thanks.

Snoopy on 24 Sep 2017
I would like to ask a question here. One should always vectorise the code where possible, as I read in the literature. But in this simple example, it seems the for loop is a little less complicated when compared to the vectorised code examples. Would you agree? Or would you think that this depends on how well one knows about MATLAB built-in functions such as accumarray, find, etc.?