classifica​tionLearne​r/machine learning question

3 views (last 30 days)
Hi all,
Sorry for the potentially basic nature of this question. I am looking to use machine learning (e.g. SVM) to determine whether certain features in neural data can indicate performance in a task. I am doing purely a binary classification. I have started with the classificationLearner app, just to get familiarised, and then exported the code to work with my dataset within my own script.
My question is that when inputting all data to classificationLearner, can you take the output of model accuracy following k-fold as a proxy for performance on the entire dataset? That is, to determine whether all my features are suitable predictors of the performance or stimuli presented, is it valid to input all my data into classificationLeaner (or the code generated by this) and use the validationAccuracy output (following k-fold cross-validation) as my model performance for the entire dataset?
Furthermore, if this is an okay thing to do, is there a way of stratifying the data when doing training/cross validation so that I have a (roughly) even number of each class going into each fold?
I guess my thinking is that if I do k-fold cross validation on the entire dataset, I'm essentially retraining and testing the model each time (either using a leave-one-out strategy or holding out a certain percentage of the data for testing), and I can therefore use the average accuracy as my model performance. Is this correct, or wildly off the mark?
I very much appreciate any help and input!

Accepted Answer

Puru Kathuria
Puru Kathuria on 15 Jul 2020
Hi,
I understand you are trying to find a metric to measure your model performance.
K-Fold: Usually, we split the data set into training and testing sets and use the training set to train the model and testing set to test the model. We then evaluate the model performance based on an error metric to determine the accuracy of the model. This method however, is not very reliable as the accuracy obtained for one test set can be very different to the accuracy obtained for a different test set. K-fold Cross Validation(CV) provides a solution to this problem by dividing the data into k folds and ensuring that each fold is used as a testing set and others for training at some point. It can further help you in determine the fit of the model/learner.
Leave one out: Leave-one-out cross validation is K-fold cross validation taken to its logical extreme, with K equal to N, the number of data points in the set.
So ,Yes whatever you are doing is correct as per your requirements.
For more information on the implementation of k-fold/stratified k-fold/leave one out, please visit the following link.
  3 Comments
neuroDuck
neuroDuck on 15 Jul 2020
Edited: neuroDuck on 15 Jul 2020
Hi Hirra,
I think you mean to ask this as a question separately, as you have accidentally asked it on a different thread.
That said, you're overwriting data every time. You need to index it for all of the images to survive the loop.
Kind regards.
Walter Roberson
Walter Roberson on 15 Jul 2020
data = cat(1, image_patches,labels);
That code is overwriting all of data each iteration.
It looks to me as if data will not be a vector, but I do not seem to be able to locate any hellopatches() function so I cannot tell what shape it will be. As you are not doing imresize() I also cannot be sure that all of the images are the same size, so I cannot be sure that data will be the same size for each iteration. Under the circumstances you should be considering saving into a cell array.
Note: please do not post the same query multiple times. I found at least 7 copies of your query :(

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!