classificationLearner/machine learning question

Question

neuroDuck on 11 Jul 2020

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/563177-classificationlearner-machine-learning-question

Commented: Walter Roberson on 15 Jul 2020

Hi all,

Sorry for the potentially basic nature of this question. I am looking to use machine learning (e.g. SVM) to determine whether certain features in neural data can indicate performance in a task. I am doing purely a binary classification. I have started with the classificationLearner app, just to get familiarised, and then exported the code to work with my dataset within my own script.

My question is that when inputting all data to classificationLearner, can you take the output of model accuracy following k-fold as a proxy for performance on the entire dataset? That is, to determine whether all my features are suitable predictors of the performance or stimuli presented, is it valid to input all my data into classificationLeaner (or the code generated by this) and use the validationAccuracy output (following k-fold cross-validation) as my model performance for the entire dataset?

Furthermore, if this is an okay thing to do, is there a way of stratifying the data when doing training/cross validation so that I have a (roughly) even number of each class going into each fold?

I guess my thinking is that if I do k-fold cross validation on the entire dataset, I'm essentially retraining and testing the model each time (either using a leave-one-out strategy or holding out a certain percentage of the data for testing), and I can therefore use the average accuracy as my model performance. Is this correct, or wildly off the mark?

I very much appreciate any help and input!

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Puru Kathuria on 15 Jul 2020

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/563177-classificationlearner-machine-learning-question#answer_466199

Hi,

I understand you are trying to find a metric to measure your model performance.

K-Fold: Usually, we split the data set into training and testing sets and use the training set to train the model and testing set to test the model. We then evaluate the model performance based on an error metric to determine the accuracy of the model. This method however, is not very reliable as the accuracy obtained for one test set can be very different to the accuracy obtained for a different test set. K-fold Cross Validation(CV) provides a solution to this problem by dividing the data into k folds and ensuring that each fold is used as a testing set and others for training at some point. It can further help you in determine the fit of the model/learner.

Leave one out: Leave-one-out cross validation is K-fold cross validation taken to its logical extreme, with K equal to N, the number of data points in the set.

So ,Yes whatever you are doing is correct as per your requirements.

For more information on the implementation of k-fold/stratified k-fold/leave one out, please visit the following link.

3 Comments
Show 1 older commentHide 1 older comment

neuroDuck on 15 Jul 2020

Edited: neuroDuck on 15 Jul 2020

Hi Hirra,

I think you mean to ask this as a question separately, as you have accidentally asked it on a different thread.

That said, you're overwriting data every time. You need to index it for all of the images to survive the loop.

Kind regards.

Walter Roberson on 15 Jul 2020

Open in MATLAB Online

data = cat(1, image_patches,labels);

That code is overwriting all of data each iteration.

It looks to me as if data will not be a vector, but I do not seem to be able to locate any hellopatches() function so I cannot tell what shape it will be. As you are not doing imresize() I also cannot be sure that all of the images are the same size, so I cannot be sure that data will be the same size for each iteration. Under the circumstances you should be considering saving into a cell array.

Note: please do not post the same query multiple times. I found at least 7 copies of your query :(

Sign in to comment.

classificationLearner/machine learning question

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments
Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

classifica​tionLearne​r/machine learning question

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

classificationLearner/machine learning question

0 Comments
Show -2 older commentsHide -2 older comments

3 Comments
Show 1 older commentHide 1 older comment