How to transform categorical features of a data set into numbers?

1 view (last 30 days)
I'm doing preprocessing of Data set and my task is to transform categorical features into numeric.
I am using NSL_KDD datasets.
In that, 3 columns are said to be in categorical form
For e.g. My variables looke like....
protocol_type service flag
tcp ftp_data SF
udp other SF
tcp private S0
I need to change into
protocol_type service flag
1 20 9
2 44 9
1 49 5
In the dataset,
Feature 'protocol_type' has 3 categories
Feature 'service' has 64 categories
Feature 'flag' has 11 categories

Accepted Answer

Star Strider
Star Strider on 2 Aug 2020
Here is one way to create a numeric representation for each category:
C = {'tcp' 'ftp_data' 'SF'
'udp' 'other' 'SF'
'tcp' 'private' 'S0'};
[Uprotocol_type,pt_first,pt_idx] = unique(C(:,1),'stable');
[Uservice,s_first,s_idx] = unique(C(:,2),'stable');
[Uflag,f_first,f_idx] = unique(C(:,3),'stable');
Npt = cell2mat(accumarray((1:numel(pt_idx)).', pt_idx, [], @(x){x}));
Ns = cell2mat(accumarray((1:numel(s_idx)).', s_idx, [], @(x){x}));
Nf = cell2mat(accumarray((1:numel(f_idx)).', f_idx, [], @(x){x}));
Result = [Npt, Ns, Nf]
producing:
Result =
1 1 1
2 2 1
1 3 2
The first output of the unique calls are the unique elements of each column. The second output of unique is the index of the first appearance of those categories in the array. (I use a cell array here for convenience.) The third output is the appropriate reference to each element in the input to unique. The accumarray call produces the appropriate numerical representation for every element in the original array.
I have no idea how you would map the numerical categories in your question to these values, however since we do not have the entire list of them, that may not be necessary and this code may produce the appropriate references without modification when that list is provided.
Using 'stable' in the unique calls preserves the original order, and matching references in both outputs. To have them in sorted order (here lexically), remove the 'stable' argument.
.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!