Reuse dimensionality reduction after designing model
1 view (last 30 days)
Show older comments
Hi. I'm using a binary classification with SVM and MLP for financial data. My input data has 21 features so I used dimensionally reduction methods for reducing the dimension of data. Some dimensionally reduction methods like stepwise regression report best features so I will used these features for my classification mode and another methods like PCA transform data to a new space and I use for instance 60% of best reported columns (features). The critical problem is in the phase of using final model. For example I used the financial data of past year and two years ago for today financial position. So now I want use past and today data to prediction next year. My question is here: Should I use PCA for new input data before inserting to my designed classification model? How can I use (For example Principal component analysis) for this data? I must use it like before? (pca(newdata…)) or there is some results from last PCA that I must use in this phase?
Thank you so much for your kind helps.
0 Comments
Accepted Answer
Greg Heath
on 28 Mar 2014
PCA does not take into account the output variance. Therefore, it is suboptimal for classification.
Use PLS instead.
Whatever transformations are made to training data must also be used in validation, test and operational data using the EXACT SAME transformation formed from training data characteristics.
Hope this helps.
Thank you for formally accepting my answer
Greg
More Answers (2)
Tom Lane
on 29 Mar 2014
Greg seems to have some good ideas. However, going back to your original question, this is how it looks to me. If you apply PCA to your original data and train a model using the components you compute, then you do not want to do a new PCA on your new data. You want to get the coefficients from the PCA of your old data, and use them to compute components (scores) for the new data.
4 Comments
Greg Heath
on 30 Mar 2014
"I read somewhere" doesn't mean much unless you are positive it is with respect to PLS vs PCA.
Greg Heath
on 28 Mar 2014
I am unfamiliar with Neighborhood components analysis.
PCA maximizes variance in the input space without regard to outputs. It is used often. In general however, it is suboptimal for reduced feature classification and regression.
On the other hand, PLS considers the linear I/O transformation. Like PCA, it is applicable to polynomials. However, I do not use higher powers than squares and crossproducts.
For nonlinear classification I use MATLAB's MLP PATTERNNET. I sometimes use the RBF NEWRB. However, it is not very flexible (identically shaped spherical Gaussian basis functions at locations of algorithm selected training data).
When data sets are large use crossvalidation.
Hope this helps.
Thank you for formally accepting my answer
Greg
2 Comments
See Also
Categories
Find more on Dimensionality Reduction and Feature Extraction in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!