How to decide on good/acceptable difference in prediction quality between training and test dataset

2 views (last 30 days)
I've come across the sentence: "quality of prediction was estimated to be good if the difference between the training and test dataset was <5 and acceptable if it was <10%". Now my question is, how did the person choose this difference to be good or acceptable, respectively? Is that the difference on always takes or is there a rule? A reference to relate to? Advice would be much appreciated. Isabel

Answers (1)

Image Analyst
Image Analyst on 15 Feb 2017
It totally depends on your case. For many applications where higher precision is needed, 5 and 10% might be not good enough. Imagine that the bolt holes that attach your water pump onto your car engine were within +/- 10% of some nominal spacing. That's not good enough because the bolt holes might not align with the holes on your engine block. So it really depends on what you're doing. Basically you have to decide what's close enough for your situation.
  2 Comments
Isabel Hostettler
Isabel Hostettler on 15 Feb 2017
Thank you very much for this useful answer. Is it really an individual thing or is there a rule one can follow? Has anyone published anything about it? Thanks for your help!
Image Analyst
Image Analyst on 15 Feb 2017
Not that I'm aware of though there may be. It would probably be specific to certain situations or industries. For example, the packaged goods industry may say that a color difference within 10% is good enough but the publishing/poster/magazine industry may want 2 or 5%. But those are just industry standards/guidelines. For example the orange on a Tide detergent box may have one tolerance but the red on a can of Coca Cola may have a different tolerance even though they're both packaged goods.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!