Monte Carlo repetitions with customized partitions
1 view (last 30 days)
Show older comments
k = 5; %number of partitions
c = cvpartition(Labels{2},"KFold", k ,"Stratify",true);
test_idx = test(c,"all");
for ii = 1:5
%%%% Divide into train and test set via logical indexing (5columns = 5
%%%% partitions. Label 1 & 3 are always used for testing
testIndices(:,ii) = logical([ones(numel(Labels{1}),1); test_idx(:,ii); ones(numel(Labels{3}),1)]);
end
c = cvpartition("CustomPartition",testIndices);
I want to customize partitions for cross-validation, but with some of the samples to be tested for in each partition. Is there a way to do it?
I tried using cvpartition, but I can either customize the partitions and get the Error: "Each observation must be present in one test set."
Or I use monte carlo repetitions which allows for samples to be used more than once as testing set, but then I cant customize the sets anymore.
I'm thankful for any hint.
9 Comments
Harald
on 3 Apr 2024
Duh... that makes sense. I suppose it will take some fiddling to address this.
I would try this strategy:
- To be able to determine which columns were chosen, add fake data 1:numColumns to on top of the x-values and some nonsense value that does not appear in your y-values on top of the y-values that you supply to sequentialfs.
- Identify which of the y-values passed to the function (either yTrain or yTest) contains the nonsense value. Extract the corresponding row of x-values from xTrain or xTest. This will tell you which columns were sent into the function.
- Extract the corresponding columns from SFS_xAlwaysIn and add it to the test data. Be sure to remove the fake data of the first step.
I expect this to be somewhat tricky and would be happy to try to help, but would really need some sample data for SFS_xtrain and SFS_ytrain to play with. Perhaps I should be able to infer this, but I am not even sure of the data type of SFS_ytrain.
Best wishes,
Harald
Accepted Answer
Harald
on 4 Apr 2024
I have now tried the approach discussed in the comments with sample data based on fisheriris.mat.
%% Sample data
load fisheriris.mat
species = categorical(species);
% Shuffle data
order = randperm(length(species));
meas = meas(order,:);
species = species(order,:);
SFS_xtrain = meas(1:130,:);
SFS_ytrain = species(1:130);
SFS_xAlwaysIn = meas(131:end,:);
SFS_yAlwaysIn = species(131:end);
%% Add fake data
SFS_xtrain = [1:size(SFS_xtrain, 2); SFS_xtrain];
SFS_ytrain = ["nonsense"; SFS_ytrain];
%% Your code (for now without setting "nfeatures" and "options")
k = 5;
c = cvpartition(SFS_ytrain,"KFold", k ,"Stratify",true);
% opts = statset("UseParallel",true);
fun = @(XTrain,yTrain,XTest,yTest) callErrorFun(XTrain,yTrain, XTest, yTest, SFS_xAlwaysIn, SFS_yAlwaysIn);
[toKeep, ranking] = sequentialfs(fun,SFS_xtrain,SFS_ytrain,"cv",c);
%% A helper function
function err = callErrorFun(XTrain,yTrain, XTest, yTest, SFS_xAlwaysIn, SFS_yAlwaysIn)
if sum(yTrain == "nonsense") == 1
idx = yTrain == "nonsense";
columns = XTrain(idx, :);
XTrain(idx,:) = [];
yTrain(idx) = [];
elseif sum(yTest == "nonsense") == 1
idx = yTest == "nonsense";
columns = XTest(idx, :);
XTest(idx,:) = [];
yTest(idx) = [];
else
error("Something unexpected happened. Revisit the approach...")
end
XTrain = [XTrain; SFS_xAlwaysIn(:, columns)];
yTrain = [yTrain; SFS_yAlwaysIn];
err = errorFun(XTrain,yTrain,XTest,yTest);
end
%% Your function
function error = errorFun(XTrain,yTrain,XTest,yTest)
% Create the model with the learning method of your choice
classifier = fitcdiscr(XTrain,yTrain);
% Calculate the number of test observations misclassified
ypred = predict(classifier,XTest);
error = nnz(ypred ~= yTest);
end
I hope you'll find this to be helpful.
Best wishes,
Harald
2 Comments
Harald
on 4 Apr 2024
Glad it's working for you! If you found the answer to be helpful, please consider "accept"-ing it.
Best wishes,
Harald
More Answers (0)
See Also
Categories
Find more on Classification Trees in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!