Main Content

Create Dummy Variables for Categorical Predictors and Generate C/C++ Code

This example shows how to generate code for classifying data using a support vector machine (SVM) model. Train the model using numeric and encoded categorical predictors. Use dummyvar to convert categorical predictors to numeric dummy variables before fitting an SVM classifier. When passing new data to your trained model, you must preprocess the data in a similar manner.

Alternatively, if a trained model identifies categorical predictors in the CategoricalPredictors property, then you do not need to create dummy variables manually to generate code. The software handles categorical predictors automatically. For an example, see Generate Code to Classify Data in Table.

Preprocess Data and Train SVM Classifier

Load the patients data set. Create a table using the Diastolic and Systolic numeric variables. Each row of the table corresponds to a different patient.

load patients
tbl = table(Diastolic,Systolic);
head(tbl)
    Diastolic    Systolic
    _________    ________

       93          124   
       77          109   
       83          125   
       75          117   
       80          122   
       70          121   
       88          130   
       82          115   

Convert the Gender variable to a categorical variable. The order of the categories in categoricalGender is important because it determines the order of the columns in the predictor data. Use dummyvar to convert the categorical variable to a matrix of zeros and ones, where a 1 value in the (i,j)th entry indicates that the ith patient belongs to the jth category.

categoricalGender = categorical(Gender);
orderGender = categories(categoricalGender)
orderGender = 2x1 cell
    {'Female'}
    {'Male'  }

dummyGender = dummyvar(categoricalGender);

Note: The resulting dummyGender matrix is rank deficient. Depending on the type of model you train, this rank deficiency can be problematic. For example, when training linear models, remove the first column of the dummy variables.

Create a table that contains the dummy variable dummyGender with the corresponding variable headings. Combine this new table with tbl.

tblGender = array2table(dummyGender,'VariableNames',orderGender);
tbl = [tbl tblGender];
head(tbl)
    Diastolic    Systolic    Female    Male
    _________    ________    ______    ____

       93          124         0        1  
       77          109         0        1  
       83          125         1        0  
       75          117         1        0  
       80          122         1        0  
       70          121         1        0  
       88          130         1        0  
       82          115         0        1  

Convert the SelfAssessedHealthStatus variable to a categorical variable. Note the order of the categories in categoricalHealth, and convert the variable to a numeric matrix using dummyvar.

categoricalHealth = categorical(SelfAssessedHealthStatus);
orderHealth = categories(categoricalHealth)
orderHealth = 4x1 cell
    {'Excellent'}
    {'Fair'     }
    {'Good'     }
    {'Poor'     }

dummyHealth = dummyvar(categoricalHealth);

Create a table that contains dummyHealth with the corresponding variable headings. Combine this new table with tbl.

tblHealth = array2table(dummyHealth,'VariableNames',orderHealth);
tbl = [tbl tblHealth];
head(tbl)
    Diastolic    Systolic    Female    Male    Excellent    Fair    Good    Poor
    _________    ________    ______    ____    _________    ____    ____    ____

       93          124         0        1          1         0       0       0  
       77          109         0        1          0         1       0       0  
       83          125         1        0          0         0       1       0  
       75          117         1        0          0         1       0       0  
       80          122         1        0          0         0       1       0  
       70          121         1        0          0         0       1       0  
       88          130         1        0          0         0       1       0  
       82          115         0        1          0         0       1       0  

The third row of tbl, for example, corresponds to a patient with these characteristics: diastolic blood pressure of 83, systolic blood pressure of 125, female, and good self-assessed health status.

Because all the values in tbl are numeric, you can convert the table to a matrix X.

X = table2array(tbl);

Train an SVM classifier using X and a Gaussian kernel function with an automatic kernel scale. Specify the Smoker variable as the response.

Y = Smoker;
Mdl = fitcsvm(X,Y, ...
    'KernelFunction','gaussian','KernelScale','auto');

Generate C/C++ Code

Generate code that loads the SVM classifier, takes new predictor data as an input argument, and then classifies the new data.

Save the SVM classifier to a file using saveLearnerForCoder.

saveLearnerForCoder(Mdl,'SVMClassifier')

saveLearnerForCoder saves the classifier to the MATLAB® binary file SVMClassifier.mat as a structure array in the current folder.

Define the entry-point function mySVMPredict, which takes new predictor data as an input argument. Within the function, load the SVM classifier by using loadLearnerForCoder, and then pass the loaded classifier to predict.

function label = mySVMPredict(X) %#codegen
Mdl = loadLearnerForCoder('SVMClassifier');
label = predict(Mdl,X);
end

Generate code for mySVMPredict by using codegen. Specify the data type and dimensions of the new predictor data by using coder.typeof so that the generated code accepts a variable-size array.

codegen mySVMPredict -args {coder.typeof(X,[Inf 8],[1 0])}
Code generation successful.

Verify that mySVMPredict and the MEX file return the same results for the training data.

label = predict(Mdl,X);
mylabel = mySVMPredict(X);
mylabel_mex = mySVMPredict_mex(X);
verifyMEX = isequal(label,mylabel,mylabel_mex)
verifyMEX = logical
   1

Predict Labels for New Data

To predict labels for new data, you must first preprocess the new data. If you run the generated code in the MATLAB environment, you can follow the preprocessing steps described in this section. If you deploy the generated code outside the MATLAB environment, the preprocessing steps can differ. In either case, you must ensure that the new data has the same columns as the training data X.

In this example, take the third, fourth, and fifth patients in the patients data set. Preprocess the data for these patients so that the resulting numeric matrix matches the form of the training data.

Convert the categorical variables to dummy variables. Because the new observations might not include values from all categories, you need to specify the same categories as the ones used during training and maintain the same category order. In MATLAB, pass the ordered cell array of category names associated with the corresponding training data variable (in this example, orderGender for gender values and orderHealth for self-assessed health status values).

newcategoricalGender = categorical(Gender(3:5),orderGender);
newdummyGender = dummyvar(newcategoricalGender);

newcategoricalHealth = categorical(SelfAssessedHealthStatus(3:5),orderHealth);
newdummyHealth = dummyvar(newcategoricalHealth);

Combine all the new data into a numeric matrix.

newX = [Diastolic(3:5) Systolic(3:5) newdummyGender newdummyHealth]
newX = 3×8

    83   125     1     0     0     0     1     0
    75   117     1     0     0     1     0     0
    80   122     1     0     0     0     1     0

Note that newX corresponds exactly to the third, fourth, and fifth rows of the matrix X.

Verify that mySVMPredict and the MEX file return the same results for the new data.

newlabel = predict(Mdl,newX);
newmylabel = mySVMPredict(newX);
newmylabel_mex = mySVMPredict_mex(newX);
newverifyMEX = isequal(newlabel,newmylabel,newmylabel_mex)
newverifyMEX = logical
   1

See Also

| | | (MATLAB Coder) | (MATLAB Coder) | | (MATLAB Coder) |

Related Topics