Overfitting indicators in GPR model
13 views (last 30 days)
As I understand if the trainloss is greater than the testloss, the GPR model is overfitted. What are the other indicators which tell us that the model is overfitted and how we can prevent it or take corrective steps? Please help.
Kshittiz Bhardwaj on 10 Jul 2022
Hello Josh, I understand you want to know about some other indicators which tell us if the model is overfitted and the measures which can be taken to prevent it.
Some other indicators apart from loss are:
1) error: When error is less in training and more in test
2) accuracy: When accuracy is high in training and low in test
Measure which can be taken are:
Cross Validation: We can split our dataset into k groups (k-fold cross-validation). We let one of the groups to be the testing set (please see hold-out explanation) and the others as the training set, and repeat this process until each individual group has been used as the testing set (e.g., k repeats).
Data Augmentation: A larger dataset would reduce overfitting. If we cannot gather more data and are constrained to the data we have in our current dataset, we can apply data augmentation to artificially increase the size of our dataset.
Feature Selection: If we have only a limited amount of training samples, each with a large number of features, we should only select the most important features for training so that our model doesn’t need to learn for so many features and eventually overfit.
There are a lot of other methods too and if you try and spend a little time on the internet I'm confident you can get a lot of relevant information.