Function approximation: Neural network great 'on paper' but when simulated results are very bad?
3 views (last 30 days)
Show older comments
I need some help with NN because I don't understand what happened. One hidden layer, I=4, H=1:20, O=1. I run each net architecture 10 times with different initial weights (left default initnw). I have in total 34 datasets which were divided 60/20/20 when using Levenberg-Marquadt algorithm. Mse_goal = 0.01*mean(var(t',1)), i calculate NMSE and R^2, choose best R^2, for that check performance of each subsample, check regression plots, check rmse. R^2 is usually around 0,95; R for each subset 0,98... But when I simulate network with completely new set of data, estimations deviate quite a lot. It is not because of extrapolation. Data are normalized with mapminmax, transfer functions tansig, purelin.
Trainbr was my first choice actually, since I have small dataset and trainbr doesn't need validation set (Matlab2015a), but it is awfully slow. I ran a net with trainbr and we are talking hours versus minutes with trainlm.
I've read a ton of Greg Heath's posts and tutorials and found very valuable information there, however, still nothing. I see no way out.
% Solve an Input-Output Fitting problem with a Neural Network
% Script generated by Neural Fitting app
% Created 09-Aug-2016 18:33:13
% This script assumes these variables are defined:
%
% MP_UA_K - input data.
% UA_K - target data.
close all, clear all
load varUA_K
x = MP_UA_K;
t = UA_K;
var_t=mean(var(t',1)); %t variance
[inputs,obs]=size(x); %
hiddenLayerSize = 20; %max number of neurons
numNN = 10; % number of training runs
neurons = [1:hiddenLayerSize]';
training_no = 1:numNN;
obs_no = 1:obs;
nets = cell(hiddenLayerSize,numNN);
trainOutputs = cell(hiddenLayerSize,numNN);
valOutputs = cell(hiddenLayerSize,numNN);
testOutputs = cell(hiddenLayerSize,numNN);
Y_all = cell(hiddenLayerSize,numNN);
performance = zeros(hiddenLayerSize,numNN);
trainPerformance = zeros(hiddenLayerSize,numNN);
valPerformance = zeros(hiddenLayerSize,numNN);
testPerformance = zeros(hiddenLayerSize,numNN);
e = zeros(numNN,obs);
e_all = cell(hiddenLayerSize,numNN);
NMSE = zeros(hiddenLayerSize,numNN);
r_train = zeros(hiddenLayerSize,numNN);
r_val = zeros(hiddenLayerSize,numNN);
r_test = zeros(hiddenLayerSize,numNN);
r = zeros(hiddenLayerSize,numNN);
Rsq = zeros(hiddenLayerSize,numNN);
for j=1:hiddenLayerSize
% Choose a Training Function
% For a list of all training functions type: help nntrain
% 'trainlm' is usually fastest.
% 'trainbr' takes longer but may be better for challenging problems.
% 'trainscg' uses less memory. Suitable in low memory situations.
trainFcn = 'trainbr'; % Bayesian Regularization backpropagation.
% Create a Fitting Network
net = fitnet(j,trainFcn);
% Choose Input and Output Pre/Post-Processing Functions
% For a list of all processing functions type: help nnprocess
net.input.processFcns = {'removeconstantrows','mapminmax'};
net.output.processFcns = {'removeconstantrows','mapminmax'};
% Setup Division of Data for Training, Validation, Testing
% For a list of all data division functions type: help nndivide
% podaci su sortirani prema zavisnoj varijabli, cca svaki treći dataset je
% testni
net.divideFcn = 'divideind'; % Divide data by index
net.divideMode = 'sample'; % Divide up every sample
net.divideParam.trainInd = [1:3:34,2:3:34];
% net.divideParam.valInd = [5:5:30];
net.divideParam.testInd = [3:3:34];
mse_goal = 0.01*var_t;
% Choose a Performance Function
% For a list of all performance functions type: help nnperformance
net.performFcn = 'mse'; % Mean Squared Error
net.trainParam.goal = mse_goal;
% Choose Plot Functions
% For a list of all plot functions type: help nnplot
net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
'plotregression', 'plotfit'};
for i=1:numNN
% Train the Network
net = configure(net,x,t);
disp(['No. of hidden nodes ' num2str(j) ', Training ' num2str(i) '/' num2str(numNN)])
[nets{j,i}, tr{j,i}] = train(net,x,t);
y = nets{j,i}(x);
e (i,:) = gsubtract(t,y);
e_all{j,i}= e(i,:);
trainTargets = t .* tr{j,i}.trainMask{1};
%valTargets = t .* tr{j,i}.valMask{1};
testTargets = t .* tr{j,i}.testMask{1};
trainPerformance(j,i) = perform(net,trainTargets,y);
%valPerformance(j,i) = perform(net,valTargets,y);
testPerformance(j,i) = perform(net,testTargets,y);
performance(j,i)= perform(net,t,y);
rmse_train(j,i)=sqrt(trainPerformance(j,i));
%rmse_val(j,i)=sqrt(valPerformance(j,i));
rmse_test(j,i)=sqrt(testPerformance(j,i));
rmse(j,i)=sqrt(performance(j,i));
% outputs of all networks
Y_all{j,i}= y;
trainOutputs {j,i} = y .* tr{j,i}.trainMask{1};
%valOutputs {j,i} = y .* tr{j,i}.valMask{1};
testOutputs {j,i} = y .* tr{j,i}.testMask{1};
[r(j,i)] = regression(t,y);
[r_train(j,i)] = regression(trainTargets,trainOutputs{j,i});
%[r_val(j,i)] = regression(valTargets,valOutputs{j,i});
[r_test(j,i)] = regression(testTargets,testOutputs{j,i});
NMSE(j,i) = mse(e_all{j,i})/mean(var(t',1)); % normalized mse
% coefficient of determination
Rsq(j,i) = 1-NMSE(j,i);
end
[minperf_train,I_train] = min(trainPerformance',[],1);
minperf_train = minperf_train';
I_train = I_train';
% [minperf_val,I_valid] = min(valPerformance',[],1);
% minperf_val = minperf_val';
% I_valid = I_valid';
[minperf_test,I_test] = min(testPerformance',[],1);
minperf_test = minperf_test';
I_test = I_test';
[minperf,I_perf] = min(performance',[],1);
minperf = minperf';
I_perf = I_perf';
[maxRsq,I_Rsq] = max(Rsq',[],1);
maxRsq = maxRsq';
I_Rsq = I_Rsq';
[train_min,train_min_I] = min(minperf_train,[],1);
% [val_min,val_min_I] = min(minperf_val,[],1);
[test_min,test_min_I] = min(minperf_test,[],1);
[perf_min,perf_min_I] = min(minperf,[],1);
[Rsq_max,Rsq_max_I] = max(maxRsq,[],1);
end
figure(4)
hold on
xlabel('observation no.')
ylabel('targets')
scatter(obs_no,trainTargets,'b')
% scatter(obs_no,valTargets,'g')
scatter(obs_no,testTargets,'r')
hold off
figure(5)
hold on
xlabel('neurons')
ylabel('min. performance')
plot(neurons,minperf_train,'b',neurons,minperf_test,'r',neurons,minperf,'k')
hold off
figure(6)
hold on
xlabel('neurons')
ylabel('max Rsq')
scatter(neurons,maxRsq,'k')
hold off
% View the Network
%view(net)
% Plots
% Uncomment these lines to enable various plots.
%figure, plotperform(tr)
%figure, plottrainstate(tr)
%figure, ploterrhist(e)
%figure, plotregression(t,y)
%figure, plotfit(net,x,t)
% Deployment
% Change the (false) values to (true) to enable the following code blocks.
% See the help for each generation function for more information.
save figure(4).fig
save figure(5).fig
save figure(6).fig
if (false)
% Generate MATLAB function for neural network for application
% deployment in MATLAB scripts or with MATLAB Compiler and Builder
% tools, or simply to examine the calculations your trained neural
% network performs.
genFunction(net,'nn_UA_K_BR');
y = nn_UA_K_BR(x);
end
% sačuvati sve varijable iz workspacea u poseban file za daljnju analizu
save ws_UA_K_BR
0 Comments
Accepted Answer
Greg Heath
on 3 Sep 2016
Edited: Greg Heath
on 5 Sep 2016
% I need some help with NN because I don't understand what happened. One % hidden layer, I=4, H=1:20, O=1. I run each net architecture 10 times % with different initial weights (left default initnw). I have in total % 34 datasets
Do you mean data points N = 34?
It typically takes ~ 10 to 30 data points per dimension to
adequately characterize a distribution. For a 4-D distribution I'd recommend
40 <~ Ntrn <~ 120
% which were divided 60/20/20 when using Levenberg-Marquadt
Ntrn = 34-2*round(0.2*34) = 20
Hub = (20-1)/(4+1+1) = 3.2
indicating you really don't have enough data to adequately characterize a 4-D distribution.
You should consider
1. Dimensionality reduction
2. k-fold crossvalidation
3. Adding new data with the same mean and covariance (stdv +
correlations) matrix
% algorithm. Mse_goal = 0.01*mean(var(t',1)), i calculate NMSE and R^2, % choose best R^2, for that check performance of each subsample, check % regression plots, check rmse. R^2 is usually around 0,95; R for each % subset 0,98... But when I simulate network with completely new set of % data, estimations deviate quite a lot. It is not because of % extrapolation.
No. It probably is. Your training data subset is insufficiently
large for 4 dimensions.
I would begin with minimizing H with dividetrain. Then consider
k-fold crossvalidation.
% Data are normalized with mapminmax, transfer functions tansig, % purelin. % Trainbr was my first choice actually, since I have small dataset and % trainbr doesn't need validation set (Matlab2015a), but it is awfully % slow. I ran a net with trainbr and we are talking hours versus minutes % with trainlm.
This may be a BUG. Let MATLAB know. What version are you using?
>> ver
% I've read a ton of Greg Heath's posts and tutorials and found very % valuable information there, however, still nothing. I see no way out.
It typically takes ~ 10 to 30 data points per dimension to adequately
characterize a distribution,
I suggest calculating the means and stdv for each data set to see how
much your training data is representative of the total 4-D
distribution that includes the new datasets. 2 or 3-D
color coded projections may be helpful.
Hope this helps.
Greg
12 Comments
Greg Heath
on 28 Sep 2016
Edited: Greg Heath
on 28 Sep 2016
Please post your data in *.m or *.txt.
NEVERMIND! SEE BELOW.
More Answers (1)
Greg Heath
on 28 Sep 2016
AN OPTIMISTIC ESTIMATE USING DIVIDETRAIN:
% Solve an Input-Output Fitting problem with a Neural Network
% Script generated by Neural Fitting app
% Created 09-Aug-2016 18:33:13
% This script assumes these variables are defined:
%
% MP_UA_K - input data.
% UA_K - target data.
close all, clear all, clc, plt=0, tic
format short e
load varUA_K
whos
% Name Size Bytes Class
% MP_UA_K 3x34 816 double
% UA_K 1x34 272 double
% plt 1x1 8 double
x = MP_UA_K; t = UA_K;
[I N ] = size(x), [O N ] = size(t)% [ 3 34 ], [ 1 34]
vart1 = mean(var(t',1)) % 1.0259e+05
xt = [x;t]; minmaxxt = minmax(xt)
% minmaxxt = 2.0700e+02 7.6000e+02
% 3.5900e+02 1.0180e+03
% 1.5100e-02 2.8500e-01 % 10^4 LOWER!!!
% 8.1300e+02 2.4070e+03
x1 = x(1,:); x2 = x(2,:); x3=x(3,:);
plt = plt+1, figure(plt)
subplot(2,2,1), plot(x1,'k','LineWidth',2)
subplot(2,2,2), plot(x2,'b','LineWidth',2)
subplot(2,2,3), plot(x3,'g','LineWidth',2)
subplot(2,2,4), plot( t,'k','LineWidth',2)
GEH1 = 'DOES NOT LOOK PROMISING!!!'
Ntrneq = N*O % DIVIDETRAIN
Hub = (Ntrneq-O)/(I+O+1) % 6.6
Hmin = 0, dH = 1, Hmax = 10
Ntrials = 10
rng(0)
j=0
for h = 0:10
j=j+1
if h==0
net = fitnet([]);
Nw = (I+1)*O
else
net = fitnet(h);
Nw = (I+1)*h+(h+1)*O
end
Ndof = Ntrneq-Nw
MSEgoal = 0.01*max(Ndof,0)*vart1/Ntrneq
net.divideFcn = 'dividetrain';
net.trainParam.goal = MSEgoal;
net.trainParam.min_grad = MSEgoal/100;
for i = 1:Ntrials
i = i
net = configure(net,x,t);
[net tr y e ] = train(net,x,t);
NMSE(i,j) = 100*mse(e)/vart1;
end
end
NMSE = NMSE
minNMSE = min(NMSE)
medNMSE = median(NMSE)
meanNMSE = mean(NMSE)
maxNMSE = max(NMSE)
totaltime = toc % 96 sec
% NONOVERFITTING 0 <= H <= 6 < Hub = 6.6
H 0 1 2 3 4 5 6
minNMSE = 48.3 33.3 19.4 10.7 8.7 7.2 6.6
medNMSE = 48.3 33.3 24.5 17.0 10.8 8.1 7.4
meanNMSE = 48.3 40.0 33.4 16.7 12.1 8.3 7.5
maxNMSE = 48.3 100.0 76.7 26.6 22.3 11.2 8.4
GEH2 = 'With H = 6 can get Rsquare = 93.4 !'
% OVERFITTING Hub = 6.6 < 7 <= H <= 10
H 7 8 9 10
minNMSE = 5.97 5.96 5.96 5.96
medNMSE = 6.22 5.96 5.96 5.96
meanNMSE = 6.47 6.02 6.02 5.96
maxNMSE = 8.16 6.42 6.53 5.96
GEH3 = 'With OVERFITTING can only get 94.0 !'
% NMSE = NMSE
% Columns 1 through 6
%
% 4.8282e+01 3.3313e+01 2.6913e+01 1.9122e+01 9.3848e+00 1.1225e+01
% 4.8282e+01 3.3313e+01 2.2170e+01 1.0726e+01 1.0602e+01 8.7863e+00
% 4.8282e+01 3.3313e+01 2.1539e+01 1.5017e+01 1.3730e+01 7.8872e+00
% 4.8282e+01 3.3313e+01 2.0225e+01 1.5821e+01 1.1673e+01 7.5152e+00
% 4.8282e+01 3.3313e+01 1.9368e+01 1.2777e+01 1.2493e+01 7.6062e+00
% 4.8282e+01 3.3313e+01 6.2003e+01 1.1113e+01 2.2313e+01 8.0091e+00
% 4.8282e+01 3.3313e+01 7.6666e+01 1.8246e+01 1.0316e+01 8.2620e+00
% 4.8282e+01 3.3313e+01 3.1822e+01 1.9369e+01 1.1088e+01 8.6014e+00
% 4.8282e+01 1.0000e+02 3.2846e+01 1.8222e+01 8.7025e+00 8.1623e+00
% 4.8282e+01 3.3313e+01 2.0608e+01 2.6597e+01 1.0326e+01 7.2022e+00
%
% Columns 7 through 11
%
% 6.5668e+00 5.9673e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 7.2365e+00 6.6139e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 8.3531e+00 5.9903e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 7.3784e+00 8.1612e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 7.3713e+00 6.8227e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 7.6491e+00 6.2822e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 8.3575e+00 6.6919e+00 6.4153e+00 5.9635e+00 5.9635e+00
% 6.6564e+00 6.1604e+00 6.0776e+00 5.9635e+00 5.9635e+00
% 7.0978e+00 6.0554e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 8.0990e+00 5.9676e+00 5.9635e+00 6.5254e+00 5.9635e+00
Hope this helps.
Greg
2 Comments
Greg Heath
on 28 Sep 2016
I just ran your 4-input case with DIVIDETRAIN. Although Hub = 5.5 is 1 smaller than the 6.6 of the 3 input case, the information from the new input does allows Rsquare = 0.997 for H=5. In addition, overfitting with H >= 6 does not significantly improve performance.
% % NONOVERFITTING 0 <= H <= 5 < Hub = 5.5
% H 0 1 2 3 4 5
% minNMSE = 10.5 9.82 2.47 0.83 0.51 0.32
% medNMSE = 10.5 9.82 4.64 1.93 0.94 0.47
% meanNMSE = 10.5 9.82 14.7 2.48 1.00 0.48
% maxNMSE = 10.5 9.82 100.00 4.68 2.07 0.79
GEH2 = 'With H = 5 can get Rsquare = 99.7 !'
% OVERFITTING Hub = 5.5 < 6 <= H <= 10
% H 6 7 8 9 10
% minNMSE = 0.30 0.30 0.30 0.30 0.30
% medNMSE = 0.30 0.30 0.30 0.30 0.30
% meanNMSE = 0.35 0.41 0.30 0.30 0.30
% maxNMSE = 0.55 0.97 0.30 0.30 0.30
GEH3 = 'Cannot do significantly better by OVERFITTING!'
Hope this helps.
Greg
P.S. I used the optimistically biased DIVIDETRAIN results to get an upper bound on performance. Although the bias can be mitigated somewhat by multiplying NMSE by Ntrneq/Ndof, I prefer to use estimates based on nontraining data.
See Also
Categories
Find more on Deep Learning Toolbox in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!