How to decide on good/acceptable difference in prediction quality between training and test dataset

Question

Isabel Hostettler on 15 Feb 2017

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/325139-how-to-decide-on-good-acceptable-difference-in-prediction-quality-between-training-and-test-dataset

Commented: Image Analyst on 15 Feb 2017

I've come across the sentence: "quality of prediction was estimated to be good if the difference between the training and test dataset was <5 and acceptable if it was <10%". Now my question is, how did the person choose this difference to be good or acceptable, respectively? Is that the difference on always takes or is there a rule? A reference to relate to? Advice would be much appreciated. Isabel

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Image Analyst on 15 Feb 2017

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/325139-how-to-decide-on-good-acceptable-difference-in-prediction-quality-between-training-and-test-dataset#answer_254869

It totally depends on your case. For many applications where higher precision is needed, 5 and 10% might be not good enough. Imagine that the bolt holes that attach your water pump onto your car engine were within +/- 10% of some nominal spacing. That's not good enough because the bolt holes might not align with the holes on your engine block. So it really depends on what you're doing. Basically you have to decide what's close enough for your situation.

2 Comments
Show NoneHide None

Isabel Hostettler on 15 Feb 2017

Thank you very much for this useful answer. Is it really an individual thing or is there a rule one can follow? Has anyone published anything about it? Thanks for your help!

Image Analyst on 15 Feb 2017

Not that I'm aware of though there may be. It would probably be specific to certain situations or industries. For example, the packaged goods industry may say that a color difference within 10% is good enough but the publishing/poster/magazine industry may want 2 or 5%. But those are just industry standards/guidelines. For example the orange on a Tide detergent box may have one tolerance but the red on a can of Coca Cola may have a different tolerance even though they're both packaged goods.

Sign in to comment.

How to decide on good/acceptable difference in prediction quality between training and test dataset

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

2 Comments
Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

How to decide on good/acceptable difference in prediction quality between training and test dataset

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

2 Comments Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None