How to resolve if Validation and Testing accuracy are widely different?

34 views (last 30 days)
Dear experts,
I wrote a script in MATLAB to run my machine learning analysis (classification problem). I see a consistent but weird issue in my results (briefly I always get good/high, reproducible validation/training accuracy but my test accuracy is always too low). I checked all five tips mentioned here: https://stackoverflow.com/questions/48718663/validation-and-testing-accuracy-widely-different, but I am still unable to resolve the problem.
I would really appreciate if someone could help me in figuring out the solution.
Thanks,
Sahil

Answers (1)

Prince Kumar
Prince Kumar on 19 Nov 2021
Edited: Prince Kumar on 19 Nov 2021
Hi Sahil Bajaj,
This generally happens when your model is learning the data instead of learning the pattern. This scenario is called 'Overfitting'.
You can try the following few things:
  • Use of regularization technique
  • Make sure each set (train, validation and test) has sufficient samples like 60%, 20%, 20% or 70%, 15%, 15% split for training, validation and test sets respectively.
  • Perform k-fold cross validation
  • Randomly shuffle the data before doing the spit, this will make sure that data distribution is nearly the same.If your data is in datastore you can use 'shuffle' function else you can use "randperm" function.

Categories

Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!