Selection of Neural Network Training Data
1 view (last 30 days)
One can divide his/her data into training, validation and testing and use them to train a neural network model (regression in my case). My question is, what if there are some data points in the training set that impair the model performance? Are there any good ways to find such data points and remove them from the training data set?
I was thinking of using something similar to cross-validation (leave one out) as:
1. Leave a data point from training set
2. Train the model with the rest of the training set
3. If there is improvement in error of the validation (or testing) sets discard the point.
4. Repeat this for all data points until no more improvement is observed.
There are two problems with this method:
1. It will take a long time for large data sets.
2. Random initial weights will add complexity on discarding data points. Constant initial values with a seed value may not be optimum set to begin with.
Greg Heath on 5 May 2017
Before learning, obtain the mean and standard deviations of the input and target variables. Overlay the plots of the variables on lines of mean +/- m*std for m= 1:4.
Remove or modify outliers.
Hope this helps
Thank you for formally accepting my answer