Explained variance for a dataset containing quantitative and qualitative data
17 views (last 30 days)
Show older comments
Hi everbody,
I'm working on datasets containing both quantitative and qualitative data. Given a subset of data, I'm trying to determine the explained variance with regard to the original mixed dataset. I understand that in case of numerical data I could use:
[~,~,~,~,explained] = pca(X(:,3:15));
explained
However I'm bound to using mixed data. The subset of the original dataset is provided to me.
Is there any obvious solution I'm missing here? I might just be lacking expertise.
Thanks in advance!
0 Comments
Answers (1)
Vijeta
on 2 May 2023
Hi Banjamin,
When dealing with mixed data, you can use a technique called Multiple Correspondence Analysis (MCA) instead of PCA to analyze the data. MCA is a multivariate statistical technique that can handle mixed datasets consisting of both quantitative and qualitative variables. MCA is based on the calculation of a similarity matrix between the different categories of the qualitative variables, which is then used to calculate the principal components.
We can normalize the quantitative data using standardization, and perform MCA on the qualitative data using the pca function in MATLAB. We then combine the MCA and quantitative data into X_mca_quant, and perform PCA on the combined data using the pca function in MATLAB. Finally, we display the explained variance using the explained variable.
Note that in this example, we assume that the qualitative variables are categorical and do not have a natural ordering. If your qualitative variables have a natural ordering, you may need to convert them to numerical values before performing MCA.
See Also
Categories
Find more on Dimensionality Reduction and Feature Extraction in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!