Function approximation: Neural network great 'on paper' but when simulated results are very bad?

Question

Tea on 3 Sep 2016

1
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/301814-function-approximation-neural-network-great-on-paper-but-when-simulated-results-are-very-bad

Commented: Greg Heath on 28 Sep 2016

I need some help with NN because I don't understand what happened. One hidden layer, I=4, H=1:20, O=1. I run each net architecture 10 times with different initial weights (left default initnw). I have in total 34 datasets which were divided 60/20/20 when using Levenberg-Marquadt algorithm. Mse_goal = 0.01*mean(var(t',1)), i calculate NMSE and R^2, choose best R^2, for that check performance of each subsample, check regression plots, check rmse. R^2 is usually around 0,95; R for each subset 0,98... But when I simulate network with completely new set of data, estimations deviate quite a lot. It is not because of extrapolation. Data are normalized with mapminmax, transfer functions tansig, purelin.

Trainbr was my first choice actually, since I have small dataset and trainbr doesn't need validation set (Matlab2015a), but it is awfully slow. I ran a net with trainbr and we are talking hours versus minutes with trainlm.

I've read a ton of Greg Heath's posts and tutorials and found very valuable information there, however, still nothing. I see no way out.

% Solve an Input-Output Fitting problem with a Neural Network
% Script generated by Neural Fitting app
% Created 09-Aug-2016 18:33:13
% This script assumes these variables are defined:
%
%   MP_UA_K - input data.
%   UA_K - target data.
close all, clear all
load varUA_K
x = MP_UA_K;
t = UA_K;
var_t=mean(var(t',1)); %t variance
[inputs,obs]=size(x); %
hiddenLayerSize = 20; %max number of neurons
numNN = 10; % number of training runs
neurons = [1:hiddenLayerSize]';
training_no = 1:numNN;
obs_no = 1:obs;
nets = cell(hiddenLayerSize,numNN);
trainOutputs = cell(hiddenLayerSize,numNN);
valOutputs = cell(hiddenLayerSize,numNN);
testOutputs = cell(hiddenLayerSize,numNN);
Y_all = cell(hiddenLayerSize,numNN);
performance = zeros(hiddenLayerSize,numNN);
trainPerformance = zeros(hiddenLayerSize,numNN);
valPerformance = zeros(hiddenLayerSize,numNN);
testPerformance = zeros(hiddenLayerSize,numNN);
e = zeros(numNN,obs);
e_all = cell(hiddenLayerSize,numNN);
NMSE = zeros(hiddenLayerSize,numNN);
r_train = zeros(hiddenLayerSize,numNN);
r_val = zeros(hiddenLayerSize,numNN);
r_test = zeros(hiddenLayerSize,numNN);
r = zeros(hiddenLayerSize,numNN);
Rsq = zeros(hiddenLayerSize,numNN);
for j=1:hiddenLayerSize
      % Choose a Training Function
      % For a list of all training functions type: help nntrain
      % 'trainlm' is usually fastest.
      % 'trainbr' takes longer but may be better for challenging problems.
      % 'trainscg' uses less memory. Suitable in low memory situations.
      trainFcn = 'trainbr';  % Bayesian Regularization backpropagation.
      % Create a Fitting Network
      net = fitnet(j,trainFcn);
      % Choose Input and Output Pre/Post-Processing Functions
      % For a list of all processing functions type: help nnprocess
      net.input.processFcns = {'removeconstantrows','mapminmax'};
      net.output.processFcns = {'removeconstantrows','mapminmax'};
      % Setup Division of Data for Training, Validation, Testing
      % For a list of all data division functions type: help nndivide
      % podaci su sortirani prema zavisnoj varijabli, cca svaki treći dataset je
      % testni
      net.divideFcn = 'divideind';  % Divide data by index
      net.divideMode = 'sample';  % Divide up every sample
      net.divideParam.trainInd = [1:3:34,2:3:34];
  %     net.divideParam.valInd = [5:5:30];
      net.divideParam.testInd = [3:3:34];
      mse_goal = 0.01*var_t;
      % Choose a Performance Function
      % For a list of all performance functions type: help nnperformance
      net.performFcn = 'mse';  % Mean Squared Error
      net.trainParam.goal = mse_goal;
      % Choose Plot Functions
      % For a list of all plot functions type: help nnplot
      net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
          'plotregression', 'plotfit'};
      for i=1:numNN
        % Train the Network
        net = configure(net,x,t);
        disp(['No. of hidden nodes '  num2str(j)  ', Training ' num2str(i) '/' num2str(numNN)])
        [nets{j,i}, tr{j,i}] = train(net,x,t);
        y = nets{j,i}(x);
        e (i,:) = gsubtract(t,y);
        e_all{j,i}= e(i,:);
        trainTargets = t .* tr{j,i}.trainMask{1};
        %valTargets = t .* tr{j,i}.valMask{1};
        testTargets = t .* tr{j,i}.testMask{1};
        trainPerformance(j,i) = perform(net,trainTargets,y);
        %valPerformance(j,i) = perform(net,valTargets,y);
        testPerformance(j,i) = perform(net,testTargets,y);
        performance(j,i)= perform(net,t,y);
        rmse_train(j,i)=sqrt(trainPerformance(j,i));
        %rmse_val(j,i)=sqrt(valPerformance(j,i));
        rmse_test(j,i)=sqrt(testPerformance(j,i));
        rmse(j,i)=sqrt(performance(j,i));
        % outputs of all networks
        Y_all{j,i}= y;
        trainOutputs {j,i} = y .* tr{j,i}.trainMask{1};
        %valOutputs {j,i} = y .* tr{j,i}.valMask{1};
        testOutputs {j,i} = y .* tr{j,i}.testMask{1};
        [r(j,i)] = regression(t,y);
        [r_train(j,i)] = regression(trainTargets,trainOutputs{j,i});
        %[r_val(j,i)] = regression(valTargets,valOutputs{j,i});
        [r_test(j,i)] = regression(testTargets,testOutputs{j,i});
        NMSE(j,i) = mse(e_all{j,i})/mean(var(t',1)); % normalized mse
        % coefficient of determination
        Rsq(j,i) = 1-NMSE(j,i);
      end
      [minperf_train,I_train] = min(trainPerformance',[],1);
      minperf_train = minperf_train';
      I_train = I_train';
%     [minperf_val,I_valid] = min(valPerformance',[],1);
%     minperf_val = minperf_val';
%     I_valid = I_valid';
      [minperf_test,I_test] = min(testPerformance',[],1);
      minperf_test = minperf_test';
      I_test = I_test';
      [minperf,I_perf] = min(performance',[],1);
      minperf = minperf';
      I_perf = I_perf';
      [maxRsq,I_Rsq] = max(Rsq',[],1);
      maxRsq = maxRsq';
      I_Rsq = I_Rsq';
      [train_min,train_min_I] = min(minperf_train,[],1);
%     [val_min,val_min_I] = min(minperf_val,[],1);
      [test_min,test_min_I] = min(minperf_test,[],1);
      [perf_min,perf_min_I] = min(minperf,[],1);
      [Rsq_max,Rsq_max_I] = max(maxRsq,[],1);
end
figure(4)
hold on
xlabel('observation no.')
ylabel('targets')
scatter(obs_no,trainTargets,'b')
% scatter(obs_no,valTargets,'g')
scatter(obs_no,testTargets,'r')
hold off
figure(5)
hold on
xlabel('neurons')
ylabel('min. performance')
plot(neurons,minperf_train,'b',neurons,minperf_test,'r',neurons,minperf,'k')
hold off
figure(6)
hold on
xlabel('neurons')
ylabel('max Rsq')
scatter(neurons,maxRsq,'k')
hold off
% View the Network
%view(net)
% Plots
% Uncomment these lines to enable various plots.
%figure, plotperform(tr)
%figure, plottrainstate(tr)
%figure, ploterrhist(e)
%figure, plotregression(t,y)
%figure, plotfit(net,x,t)
% Deployment
% Change the (false) values to (true) to enable the following code blocks.
% See the help for each generation function for more information.
save figure(4).fig
save figure(5).fig
save figure(6).fig
if (false)
    % Generate MATLAB function for neural network for application
    % deployment in MATLAB scripts or with MATLAB Compiler and Builder
    % tools, or simply to examine the calculations your trained neural
    % network performs.
    genFunction(net,'nn_UA_K_BR');
    y = nn_UA_K_BR(x);
end
% sačuvati sve varijable iz workspacea u poseban file za daljnju analizu
save ws_UA_K_BR

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Greg Heath on 3 Sep 2016

2
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/301814-function-approximation-neural-network-great-on-paper-but-when-simulated-results-are-very-bad#answer_233727

Edited: Greg Heath on 5 Sep 2016

Open in MATLAB Online

% I need some help with NN because I don't understand what happened. One % hidden layer, I=4, H=1:20, O=1. I run each net architecture 10 times % with different initial weights (left default initnw). I have in total % 34 datasets

Do you mean data points N = 34?

It typically takes ~ 10 to 30 data points per dimension to

adequately characterize a distribution. For a 4-D distribution I'd recommend

40 <~ Ntrn <~ 120

% which were divided 60/20/20 when using Levenberg-Marquadt

 Ntrn = 34-2*round(0.2*34) = 20
 Hub = (20-1)/(4+1+1) = 3.2

indicating you really don't have enough data to adequately characterize a 4-D distribution.

You should consider

Dimensionality reduction
k-fold crossvalidation
Adding new data with the same mean and covariance (stdv + 
correlations) matrix

% algorithm. Mse_goal = 0.01*mean(var(t',1)), i calculate NMSE and R^2, % choose best R^2, for that check performance of each subsample, check % regression plots, check rmse. R^2 is usually around 0,95; R for each % subset 0,98... But when I simulate network with completely new set of % data, estimations deviate quite a lot. It is not because of % extrapolation.

 No. It probably is. Your training data subset is insufficiently 
large for 4 dimensions.
 I would begin with minimizing H with dividetrain. Then consider 
k-fold crossvalidation.

% Data are normalized with mapminmax, transfer functions tansig, % purelin. % Trainbr was my first choice actually, since I have small dataset and % trainbr doesn't need validation set (Matlab2015a), but it is awfully % slow. I ran a net with trainbr and we are talking hours versus minutes % with trainlm.

This may be a BUG. Let MATLAB know. What version are you using?

>> ver

% I've read a ton of Greg Heath's posts and tutorials and found very % valuable information there, however, still nothing. I see no way out.

 It typically takes ~ 10 to 30 data points per dimension to adequately 
characterize a distribution,
I suggest calculating the means and stdv for each data set to see how 
much your training data is representative of the total 4-D 
distribution that includes the new datasets. 2 or 3-D 
color coded projections may be helpful.

Hope this helps.

Greg

12 Comments
Show 10 older commentsHide 10 older comments

Greg Heath on 5 Sep 2016

% Sir, thank you for your answer. % % Yes, I meant 34 datapoints in this dataset. For similar problem, I have % dataset with 47 datapoints, with inputs always between 2-4. Since % dimensionality reduction is not an option,

Why is it not an option? Are the inputs highly uncorrelated? What are the 6 correlation coefficients?

% I'll try either adding new % data or try k-fold crossvalidation. It seemed too complicated for % implementation before, but I guess I have few options left. % % For k-fold xval I have a few questions. You said that first I should % minimize H with dividetrain. Example code from here % https://www.mathworks.com/matlabcentral/newsreader/view_thread/333427 % actually limits the upper bound of H as you written before, Hub = % (Ntreq-O)/(I+O+1), since net.trainParam.goal = 0.01*Ndof*MSEtrn00a/Ntrneq % (if Nw > Ntreq, this would be negative number).

Yes. In general

net.trainParam.goal = 0.01*max(0,Ndof)*MSEtrn00a/Ntrneq

If Ndof < 0 you have more unknowns than equations. Therefore, there are an infinite number of solutions that will yield mse = 0 ( e.g., how many solutions does x + y = 0 have ?), most of which are not appropriate.

Nw > Ntrneq is called OVERFITTING. OVERTRAINING an overfit net can lead to useless results. In NN design the two favorite methods for not overtraining an overfit net are

1. VALIDATION STOPPING 2. REGULARIZATION a. Using optional MSEREG with TRAINLM b. Using default MSEREG with TRAINBR

although other methods are also available. For example,

>> lookfor 'regularization' plotbr - Plot network performance for Bayesian regularization training. trainbr - Bayesian Regularization backpropagation. msereg - Mean squared error with regularization performance function. mseregec - Mean squared error with regularization and economization performance function. msnereg - Mean squared normalized error with regularization performance function. lasso - Perform lasso or elastic net regularization for linear regression. lassoglm - Perform lasso or elastic net regularization for a generalized linear model.

For details on these use the commands help, doc and type.

% If minimum number of H chosen is equal to Hub, is that ok? I plan to run Ntrials = 10 % for H = 1:Hub.

You can start with H = 0:Hub. However if that doesn't work try H > Hub with one or more of the overtraining mitigation methods listed above.

% When I choose the number of neurons, let's say H=Hub, I should run k-fold % xval for that size of hidden layer only? I found several of your % tutorials, for example % http://www.mathworks.com/matlabcentral/newsreader/view_thread/340857#934329. % Can 'k' in xval be considered as Ntrials, so if k=10, I'll obtain 10 % designs for chosen H?

No. You will have a triple loop algorithm i (Ntrials), j (numH), k(datadivision)

% Would you recommend trainlm or trainbr? And if trainbr, I saw that you recommended using validation stop even with trainbr (after 6 epochs?) . I % think the difference in time I experienced with trainlm and trainbr was % most certainly because of validation stop I used with trainlm.

I would probably try both since I don't know which one is better.

Greg

Tea on 5 Sep 2016

Edited: Tea on 5 Sep 2016

%Why is it not an option Are the inputs highly uncorrelated What are the 6 correlation coefficients

Inputs were chosen by previous stepwise regression, already taking into account multicollinearity and using principle of parsimony. I know that for reliable statistical analysis at least 15 datapoints per independent variable in the model is required, but in this case number of available data was the limitation I had to accept. It is possible for some inputs to be correlated, but this is problem-specific, because they 'describe' completely different occurrence (material behaviour).

%Nw Ntrneq is called OVERFITTING. OVERTRAINING an overfit net can lead to useless results. In NN design the two favorite methods for not overtraining an overfit net are 1. VALIDATION STOPPING 2. REGULARIZATION a. Using optional MSEREG with TRAINLM b. Using default MSEREG with TRAINBR

I'm aware of these two and will probably stick to that (if it works).

%You can start with H = 0Hub. However if that doesn't work try H Hub with one or more of the overtraining mitigation methods listed above.

%No. You will have a triple loop algorithm i (Ntrials), j (numH), k(datadivision)

I realized later on that k is definitely not number of trials. Is it ok to add k-loop to the code I posted, with changes according to this new loop? So j (numH) stays outer loop, i (Ntrials) becomes middle, and k(datadivision) becomes inner loop? Theoretically, if I had fixed number of neurons and one trial, I would get one design which is average of all designs for k=10 or whatever 'k' is chosen? This sounds similar as ensemble of networks, however, at the moment I don't know how to average these results. Is there a rule for choosing 'k'? I read discussions on Cross Validated on this topic, but k seems to be arbitrary. Also, something that confused me was the statement that xval is a 'method' for evaluating models, and not for model 'building'.

%Since N = 34, why don't you post your data?

Yes, I can do that. Attached data for two different targets.

Tea on 7 Sep 2016

Open in MATLAB Online

So, I guess this is an attempt for 11-fold xval with trainbr, neurons 1:10 (I can change it to 1:Hub I guess), Ntrials = 10. Data used were varUA_Re.mat posted earlier. I obtained lowest training performance with 6 neurons, max(trainPerformance) = 773.5. The goal was not met usually, learning stopped because 'maximum mu reached'. (Btw I think that I messed up something with stopcrit and stopcritK.) So I was given a recommendation for leave-one-out-xval, but I would like to avoid that since even this code took 71 minute to execute (with a quite powerful computer). Also, to try without normalization? My question is - under the premise that this code is 'correct' how do I use this 'knowledge'? The same person that recommended LOOxval said that when I establish number of neurons with xval, I should 'train that network on all training cases and then test it on independent set of data to obtain unbiased error. I get the second part with testing, but what does it mean to train 'that' network - which network? To write a code for chosen number of neurons and train it? It seems kind of trivial. I hope it's appropriate to post such a long code here, if not, I can post it as *.txt file. And yeah, I believe it could be written in a more elegant and efficient way, but I'm too tired to think atm.

    % Solve an Input-Output Fitting problem with a Neural Network
    % Script generated by Neural Fitting app
    % Created 09-Aug-2016 18:33:13
    % k-fold cross-validation with BR algorithm
    % This script assumes these variables are defined:
    %
    %   MP_UA_Re - input data.
    %   UA_Re - target data.
    % close all, clear all
    tic
    % load varUA_Re
    x = MP_UA_Re;
    t = UA_Re;
    var_t=mean(var(t',1)); % variance of t, biased
    var_t_a=mean(var(t')); % variance of t, no bias
    [I,obs]=size(x); % size of input
    [O,obs]=size(t);
    hiddenLayerSize = 10;
    % hiddenLayerSize = -1+ceil((Ntraineq-O)/(I+O+1)) ; % formula for upper bound of neurons
    numNN = 10; % training number (each k-fold is trained 10 times)
    numK = 11; % number of folds in k-fold xval
    rng('default')
    rngstate = cell(hiddenLayerSize,numNN);
    ind0 = randperm(obs);
    % ind0 = 1:obs; % For debugging
    M = floor(obs/numK); % length(valind & testind)
    Ntrain = obs-2*M; % length(trainind)
    Ntraineq = Ntrain*O; % No. of training equations
    Nw = zeros(hiddenLayerSize,1); % number of weights
    Ndof = zeros(hiddenLayerSize,1); % number of degrees of freedom
    k_no = 1:numK;
    obs_no = 1:obs;
    neurons = 1:hiddenLayerSize;
    % creating variables, Matlab suggests for variable not to change size
    % inside the loop for speed
    nets = cell(hiddenLayerSize,numNN); % nets for hiddenLayerSize and numNN
    netsK = cell(numK,1); % nets for each fold for current number of 
    % hiddenLayerSize and numNN
    trainOutputs = cell(hiddenLayerSize,numNN); % avg of numK for hiddenLayerSize and numNN
    testOutputs = cell(hiddenLayerSize,numNN); % avg of numK for hiddenLayerSize and numNN
    trainOutK = cell(numK,1); % train outputs for all numK, current hiddenLayerSize and numNN
    testOutK = cell(numK,1); % test outputs for all numK, current hiddenLayerSize and numNN
    yK = zeros(numK,obs); % outputs for all numK for current hiddenLayerSize and numNN
    yK_avg = zeros(1,obs); % avg of yK for all numK
    Y_all = cell(hiddenLayerSize,numNN); % yK_avg for hiddenLayerSize and numNN
    tr = cell(numK,1); % tr of numK for all numNK
    tr_all = cell(hiddenLayerSize,numNN);
    % train and test performance for all numK, current hiddenLayerSize and
    % numNN
    trainPerfK = zeros(numK,1);
    testPerfK = zeros(numK,1);
    perfK = zeros(numK,1);
    perfK_a = zeros(numK,1);
    % averaged values of all numK for hiddenLayerSize and numNN
    performance = zeros(hiddenLayerSize,numNN);
    performance_a = zeros(hiddenLayerSize,numNN);
    trainPerformance = zeros(hiddenLayerSize,numNN);
    testPerformance = zeros(hiddenLayerSize,numNN);
    eK = zeros(numK,obs);  % errors of numK for current hiddenLayerSize and numNN
    eK_avg = zeros(1,obs); % avg of eK
    e_all = cell(hiddenLayerSize,numNN); % eK_avg for all hiddenLayerSize and numNN
    % root mean square errors for all numK, current hiddenLayerSize and
    % numNN
    rmse_trainK = zeros(numK,1);
    rmse_testK = zeros(numK,1);
    rmseK = zeros(numK,1);
    % averaged values of all numK for hiddenLayerSize and numNN
    rmse_train = zeros(hiddenLayerSize,numNN);
    rmse_test = zeros(hiddenLayerSize,numNN);
    rmse = zeros(hiddenLayerSize,numNN);
    % stop criteria , best epochs, number of epochs for all numK, current hiddenLayerSize and
    % numNN
    stopcritK = cell(hiddenLayerSize,numNN);
    bestepochK = zeros(numK,1);
    bestepoch = zeros(hiddenLayerSize,numNN);
    numepochs = zeros(hiddenLayerSize,numNN);
    % something missing here, needs correction
    % correlation coefficient for all numK, current hiddenLayerSize and
    % numNN
    r_trainK = zeros(numK,1);
    r_testK = zeros(numK,1);
    rK = zeros(numK,1);
    % averaged values of all numK for hiddenLayerSize and numNN   
    r_train = zeros(hiddenLayerSize,numNN);
    r_test = zeros(hiddenLayerSize,numNN);
    r = zeros(hiddenLayerSize,numNN);
    % coefficient of determination for all numK, current hiddenLayerSize and
    % numNN
    R2trainK = zeros(numK,1);
    R2train_aK = zeros(numK,1); % adjusted
    R2testK = zeros(numK,1);
    R2K = zeros(numK,1);
    R2_aK = zeros(numK,1); % adjusted
    % averaged values of all numK for hiddenLayerSize and numNN  
    R2train = zeros(hiddenLayerSize,numNN);
    R2train_a = zeros(hiddenLayerSize,numNN); % adjusted
    R2test = zeros(hiddenLayerSize,numNN);
    R2 = zeros(hiddenLayerSize,numNN);
    R2_a = zeros(hiddenLayerSize,numNN); % adjusted
    % averaged values of all numK for hiddenLayerSize and numNN 
    result = cell(hiddenLayerSize,numNN);
    minresult = cell(hiddenLayerSize,numNN); % minimum
    meanresult = cell(hiddenLayerSize,numNN); % maximum
    medresult = cell(hiddenLayerSize,numNN); % median
    stdresult = cell(hiddenLayerSize,numNN); % standard deviation
    maxresult = cell(hiddenLayerSize,numNN); % maximum results
    for j=1:hiddenLayerSize
        % number of weights and degrees of freedom for hidden layer size
        Nw(j,1) = (I+1)*hiddenLayerSize+(hiddenLayerSize+1)*O;
        Ndof(j,1) = Ntraineq-Nw(j,1);
        % Choose a Training Function
        % For a list of all training functions type: help nntrain
        % 'trainbr' takes longer but may be better for challenging problems.
        trainFcn = 'trainbr';  % Bayesian Regularization backpropagation.
        % Create a Fitting Network
        net = fitnet(j,trainFcn);
        % Choose Input and Output Pre/Post-Processing Functions
        % For a list of all processing functions type: help nnprocess
        net.input.processFcns = {'removeconstantrows','mapminmax','fixunknowns'};
        net.output.processFcns = {'removeconstantrows','mapminmax'};
        % Choose a Performance Function
        % For a list of all performance functions type: help nnperformance
        net.performFcn = 'mse';  % Mean Squared Error
        mse_goal = 0.01*var_t;
        minGrad = mse_goal/100;
        net.trainParam.goal = mse_goal;
        net.trainParam.min_grad = minGrad;
        % Setup Division of Data for Training, Validation, Testing
        % For a list of all data division functions type: help nndivide
        net.divideFcn = 'divideind';  % Divide data by index
        net.divideMode = 'sample';  % Divide up every sample
        % Choose Plot Functions
        % For a list of all plot functions type: help nnplot
        net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
            'plotregression', 'plotfit'};
        for i=1:numNN
            for k=1:numK
                rngstateK = rng;
                rngstate{j,i} = rngstateK;
                % configure the Network
                net = configure(net,x,t); % sets w i b to new random values
                disp(['No. of hidden nodes '  num2str(j) '/' num2str(hiddenLayerSize) ', Training ' num2str(i) '/' num2str(numNN) ', k-fold ' num2str(k) '/' num2str(numK)])
                % Train the Network
                [netsK{k,1}, tr{k,1}] = train(net,x,t);
                if k==numK
                    testind = 1+M*(k-1):obs;
                    trainind = 1:testind(1)-1;
                else
                    testind = 1+M*(k-1):M*k;
                    trainind = [1:testind(1)-1,testind(end)+1:obs];
                end
                trainInd = ind0(trainind); % Note upper & lower case "i"
                testInd = ind0(testind);
                net.divideParam.trainInd = trainInd;
                net.divideParam.testInd = testInd;
                y = netsK{k,1}(x);
                tr_all{j,i} = tr{k,1};
                nets{j,i} = netsK{k,1};
                yK(k,:) = y;
                yK_avg = mean(yK);
                Y_all{j,i} = yK_avg;
                eK(k,:) = gsubtract(t,yK(k,:));
                eK_avg = mean(eK);
                e_all{j,i} = eK_avg;
                trainTargets = t .* tr{k,1}.trainMask{1};
                testTargets = t .* tr{k,1}.testMask{1};
                trainPerfK(k,1) = perform(net,trainTargets,yK);
                testPerfK(k,1) = perform(net,testTargets,yK);
                trainPerformance(j,i) = mean(trainPerfK(k,1));
                testPerformance(j,i) = mean(testPerfK(k,1));
                perfK(k,1)= perform(net,t,y);
                perfK_a(k,1)= Ntraineq*perfK(k,1)/Ndof(j,1);
                performance(j,i)= mean(perfK(k,1));
                performance_a(j,i)= mean(perfK_a(k,1));
                rmse_trainK(k,1) = sqrt(trainPerfK(k,1));
                rmse_testK(k,1) = sqrt(testPerfK(k,1));
                rmseK(k,1) = sqrt(perfK(k,1));
                rmse_train(j,i) = mean(rmse_trainK(k,1));
                rmse_test(j,i) = mean(rmse_testK(k,1));
                rmse(j,i) = mean(rmseK(k,1));
                trainOutK{k,1} = y .* tr{k,1}.trainMask{1};
                testOutK{k,1} = y .* tr{k,1}.testMask{1};
                trainOutputs{j,i} = mean(trainOutK{k,1});
                testOutputs{j,i} = mean(trainOutK{k,1});
                [rK(k,1)] = regression(t,y);
                [r_trainK(k,1)] = regression(trainTargets,trainOutK{k,1});
                [r_testK(k,1)] = regression(testTargets,testOutK{k,1});
                r(j,i) = mean(r_trainK(k,1));
                r_train(j,i) = mean(r_trainK(k,1));
                r_test(j,i) = mean(r_testK(k,1));
                stopcritK{k,1} = tr{k,1}.stop;
                numepochsK(k,1) = tr{k,1}.num_epochs;
                bestepochK(k,1) = tr{k,1}.best_epoch;
                stopcrit{j,i} = stopcritK{k,1};
                numepochs(j,i) = numepochsK(k,1);
                bestepoch(j,i) = bestepochK(k,1);
                R2trainK(k,1) = 1 - tr{k,1}.best_perf/var_t;
                R2train_aK(k,1) = 1 - (Ntraineq/Ndof(j,1))*tr{k,1}.best_perf/var_t_a;
                R2testK(k,1) = 1 - tr{k,1}.best_tperf/var_t;
                R2K(k,1) = 1-mse(eK)/var_t;
                R2_aK(k,1) = 1-mse(eK)/var_t_a;
                R2train(j,i) = mean(R2trainK(k,1));
                R2train_a(j,i) = mean(R2train_aK(k,1));
                R2test(j,i) = mean(R2testK(k,1));
                R2(j,i) = mean(R2K(k,1));
                R2_a(j,i) = mean(R2_aK(k,1));
                resultK = [bestepochK R2trainK R2train_aK R2testK R2K R2_aK];
                minresultK = min(resultK);
                meanresultK = mean(resultK);
                medresultK = median(resultK);
                stdresultK = std(resultK);
                maxresultK = max(resultK);
                result{j,i} = resultK;
                minresult{j,i} = minresultK;
                meanresult{j,i} = meanresultK;
                medresult{j,i}= medresultK;
                stdresult{j,i} = stdresultK;
                maxresult{j,i}= maxresultK;
            end
        end
    end
    % min performances or R2 for each size of hidden layer
    [minperf_train,I_train] = min(trainPerformance',[],1);
    minperf_train = minperf_train';
    I_train = I_train';
    [minperf_test,I_test] = min(testPerformance',[],1);
    minperf_test = minperf_test';
    I_test = I_test';
    [minperf,I_perf] = min(performance',[],1);
    minperf = minperf';
    I_perf = I_perf';
    [maxR2,I_R2] = max(R2',[],1);
    maxR2 = maxR2';
    I_R2 = I_R2';
    [maxR2_a,I_R2_a] = max(R2_a',[],1);
    maxR2_a = maxR2_a';
    I_R2_a = I_R2_a';
    % overall min performances or R2 and indices
    [train_min,train_min_I] = min(minperf_train,[],1);
    [test_min,test_min_I] = min(minperf_test,[],1);
    [perf_min,perf_min_I] = min(minperf,[],1);
    [R2_max,R2_max_I] = max(maxR2,[],1);
    % summary = [hiddenLayerSize bestepoch finalgrad Rsq Rsq_a];
    toc
    figure(1)
    hold on
    xlabel('neurons')
    ylabel('min. performance')
    plot(neurons,minperf_train,'b',neurons,minperf_test,'r',neurons,minperf,'k')
    hold off
    figure(2)
    hold on
    xlabel('neurons')
    ylabel('max Rsq')
    scatter(neurons,maxR2,'ok')
    scatter(neurons,maxR2_a,'+k')
    hold off
    % View the Network
    %view(net)
    % Plots
    % Uncomment these lines to enable various plots.
    %figure, plotperform(tr)
    %figure, plottrainstate(tr)
    %figure, ploterrhist(e)
    %figure, plotregression(t,y)
    %figure, plotfit(net,x,t)
    % Deployment
    % Change the (false) values to (true) to enable the following code blocks.
    % See the help for each generation function for more information.
    save figure(1).fig
    save figure(2).fig
    save figure(3).fig
    if (false)
        % Generate MATLAB function for neural network for application
        % deployment in MATLAB scripts or with MATLAB Compiler and Builder
        % tools, or simply to examine the calculations your trained neural
        % network performs.
        genFunction(net,'nn_UA_Re');
        y = nn_UA_Re(x);
    end
    % save all workspace variables to a special file for further analysis
    save ws_UA_Re_BR

Tea on 27 Sep 2016

Since discussion with dr. Heath helped me a lot, I feel obligated to share what I learnt after implementing all the advice (at least I hope I learnt):

- I had to add datapoints to datasets 1 and 2 (which now contain around 60 and 70 datapoints, respectively). This was not possible for dataset 3 (contains 30 datapoints)

- I used 10-fold cross-validation (xval)

- tried trainlm, trainbr, and recently traingdx learning algorithms

- obtained better results than before, which are actually quite satisfying.

I also experimented with leave-one-out-xval (LOOCV).

My conclusions are that:

- xval shows hope for learning nets on small sample size

- xval is quite demanding computationally, especially when used without early stopping (in my case Bayesian regularization, I set max number of epochs to 1000)

- if you think 10-fold xval is demanding, try LOOCV... I haven't had time or computational resources to experiment with this much, but from what I saw, results obtained were not much better or not better at all than 10-fold xval.

Currently I merged all three datasets (around 160 datapoints) for making net, because I need these results for comparison. I'm now unsure when dataset stops being 'small' but somehow believe that 160 is still not too large to ignore benefits of xval.

I'll try to post my code once, atm I don't have time to adapt it for posting since English is not my native language.

Greg Heath on 28 Sep 2016

Edited: Greg Heath on 28 Sep 2016

Please post your data in *.m or *.txt.

NEVERMIND! SEE BELOW.

Sign in to comment.

Answer 2

Greg Heath on 28 Sep 2016

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/301814-function-approximation-neural-network-great-on-paper-but-when-simulated-results-are-very-bad#answer_236411

Open in MATLAB Online

           AN OPTIMISTIC ESTIMATE USING DIVIDETRAIN: 
 % Solve an Input-Output Fitting problem with a Neural Network
% Script generated by Neural Fitting app
% Created 09-Aug-2016 18:33:13
 % This script assumes these variables are defined:
%
%   MP_UA_K - input data.
%   UA_K - target data.
 close all, clear all, clc, plt=0, tic
format short e
 load varUA_K
whos
%  Name       Size   Bytes  Class
%   MP_UA_K   3x34   816   double
%   UA_K      1x34   272   double
%   plt       1x1      8   double
 x = MP_UA_K; t = UA_K;
[I N ] = size(x), [O N ] = size(t)% [ 3 34 ], [ 1 34]
vart1 = mean(var(t',1)) % 1.0259e+05
xt = [x;t]; minmaxxt = minmax(xt)
% minmaxxt =  2.0700e+02   7.6000e+02
%             3.5900e+02   1.0180e+03
%             1.5100e-02   2.8500e-01 % 10^4 LOWER!!!
%             8.1300e+02   2.4070e+03
 x1 = x(1,:); x2 = x(2,:); x3=x(3,:);
plt = plt+1, figure(plt)
subplot(2,2,1), plot(x1,'k','LineWidth',2)
subplot(2,2,2), plot(x2,'b','LineWidth',2)
subplot(2,2,3), plot(x3,'g','LineWidth',2)
subplot(2,2,4), plot( t,'k','LineWidth',2)

GEH1 = 'DOES NOT LOOK PROMISING!!!'

 Ntrneq = N*O  % DIVIDETRAIN
Hub    = (Ntrneq-O)/(I+O+1)  % 6.6
Hmin = 0, dH = 1, Hmax = 10
Ntrials = 10
 rng(0)
j=0
for h = 0:10
    j=j+1
    if h==0
        net = fitnet([]);
        Nw  = (I+1)*O
    else
        net = fitnet(h);
        Nw = (I+1)*h+(h+1)*O
    end
    Ndof    = Ntrneq-Nw
    MSEgoal = 0.01*max(Ndof,0)*vart1/Ntrneq
    net.divideFcn           = 'dividetrain';
    net.trainParam.goal     = MSEgoal;
    net.trainParam.min_grad = MSEgoal/100;
    for i = 1:Ntrials
        i =  i
        net = configure(net,x,t);
        [net tr y e ] = train(net,x,t);
        NMSE(i,j) = 100*mse(e)/vart1;
    end
end
 NMSE     = NMSE
minNMSE  = min(NMSE)
medNMSE  = median(NMSE)
meanNMSE = mean(NMSE)
maxNMSE  = max(NMSE)

totaltime = toc % 96 sec

 % NONOVERFITTING  0 <= H <= 6 < Hub = 6.6
H            0      1      2      3      4     5    6
minNMSE  = 48.3   33.3   19.4   10.7    8.7   7.2  6.6
medNMSE  = 48.3   33.3   24.5   17.0   10.8   8.1  7.4
meanNMSE = 48.3   40.0   33.4   16.7   12.1   8.3  7.5
maxNMSE  = 48.3  100.0   76.7   26.6   22.3  11.2  8.4

GEH2 = 'With H = 6 can get Rsquare = 93.4 !'

 % OVERFITTING    Hub = 6.6 < 7  <= H <= 10
H            7      8      9     10
minNMSE  =  5.97   5.96   5.96   5.96
medNMSE  =  6.22   5.96   5.96   5.96
meanNMSE =  6.47   6.02   6.02   5.96
maxNMSE  =  8.16   6.42   6.53   5.96

GEH3 = 'With OVERFITTING can only get 94.0 !'

 % NMSE = NMSE
% Columns 1 through 6
%
% 4.8282e+01   3.3313e+01   2.6913e+01   1.9122e+01   9.3848e+00   1.1225e+01
% 4.8282e+01   3.3313e+01   2.2170e+01   1.0726e+01   1.0602e+01   8.7863e+00
% 4.8282e+01   3.3313e+01   2.1539e+01   1.5017e+01   1.3730e+01   7.8872e+00
% 4.8282e+01   3.3313e+01   2.0225e+01   1.5821e+01   1.1673e+01   7.5152e+00
% 4.8282e+01   3.3313e+01   1.9368e+01   1.2777e+01   1.2493e+01   7.6062e+00
% 4.8282e+01   3.3313e+01   6.2003e+01   1.1113e+01   2.2313e+01   8.0091e+00
% 4.8282e+01   3.3313e+01   7.6666e+01   1.8246e+01   1.0316e+01   8.2620e+00
% 4.8282e+01   3.3313e+01   3.1822e+01   1.9369e+01   1.1088e+01   8.6014e+00
% 4.8282e+01   1.0000e+02   3.2846e+01   1.8222e+01   8.7025e+00   8.1623e+00
% 4.8282e+01   3.3313e+01   2.0608e+01   2.6597e+01   1.0326e+01   7.2022e+00
%
% Columns 7 through 11
%
% 6.5668e+00   5.9673e+00   5.9635e+00   5.9635e+00   5.9635e+00
% 7.2365e+00   6.6139e+00   5.9635e+00   5.9635e+00   5.9635e+00
% 8.3531e+00   5.9903e+00   5.9635e+00   5.9635e+00   5.9635e+00
% 7.3784e+00   8.1612e+00   5.9635e+00   5.9635e+00   5.9635e+00
% 7.3713e+00   6.8227e+00   5.9635e+00   5.9635e+00   5.9635e+00
% 7.6491e+00   6.2822e+00   5.9635e+00   5.9635e+00   5.9635e+00
% 8.3575e+00   6.6919e+00   6.4153e+00   5.9635e+00   5.9635e+00
% 6.6564e+00   6.1604e+00   6.0776e+00   5.9635e+00   5.9635e+00
% 7.0978e+00   6.0554e+00   5.9635e+00   5.9635e+00   5.9635e+00
% 8.0990e+00   5.9676e+00   5.9635e+00   6.5254e+00   5.9635e+00

Hope this helps.

Greg

2 Comments
Show NoneHide None

Tea on 28 Sep 2016

I believe that from perspective of a person with a lot of experience in ANN (such as yourself), Rsquare = 93.4 is not satisfying, but to me it's really good. Even lower Rsq, such as 0.85 is great. I tested networks with completely independent data (around 20 for each group), and results are good. When I say good, I check Rsq, R, mse, but also percentage deviation from experimental results. The thing is, one influential observation can mess with all all three indicators I mentioned, but I understand the background of the problem and I'm completely aware that even the most perfect net designed on large sample won't be able to predict such deviation.

I am working with experimental data of material behaviour (static and dynamic), and even 10 experiments of the same material from the same batch show significant amount of scatter, so it is completely unreasonable to expect Rsq = 0.99 - maybe if you have 'perfect' dataset, but perfect often means 'polished' and not representative for entire population.

I think it is important, when using ANN as a tool, to be aware of limitations of such approach, of your own field of research etc. Having some statistical background can help.

I knew I could improve my results, and I still don't know the upper bound of possible improvement - however, I'm quite satisfied. And MATLAB Answers helped me a lot.

Greg Heath on 28 Sep 2016

Open in MATLAB Online

I just ran your 4-input case with DIVIDETRAIN. Although Hub = 5.5 is 1 smaller than the 6.6 of the 3 input case, the information from the new input does allows Rsquare = 0.997 for H=5. In addition, overfitting with H >= 6 does not significantly improve performance.

 % % NONOVERFITTING  0 <= H <= 5 < Hub = 5.5
%    H        0      1      2      3     4    5
% minNMSE  = 10.5  9.82    2.47  0.83  0.51  0.32
% medNMSE  = 10.5  9.82    4.64  1.93  0.94  0.47 
% meanNMSE = 10.5  9.82   14.7   2.48  1.00  0.48
% maxNMSE  = 10.5  9.82  100.00  4.68  2.07  0.79
 GEH2 = 'With H = 5 can get Rsquare = 99.7 !'
 %  OVERFITTING    Hub = 5.5 < 6  <= H <= 10
%   H           6     7     8     9     10     
% minNMSE  =   0.30  0.30  0.30  0.30  0.30
% medNMSE  =   0.30  0.30  0.30  0.30  0.30
% meanNMSE =   0.35  0.41  0.30  0.30  0.30
% maxNMSE  =   0.55  0.97  0.30  0.30  0.30

GEH3 = 'Cannot do significantly better by OVERFITTING!'

Hope this helps.

Greg

P.S. I used the optimistically biased DIVIDETRAIN results to get an upper bound on performance. Although the bias can be mitigated somewhat by multiplying NMSE by Ntrneq/Ndof, I prefer to use estimates based on nontraining data.

Sign in to comment.

Function approximation: Neural network great 'on paper' but when simulated results are very bad?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

12 Comments
Show 10 older commentsHide 10 older comments

More Answers (1)

2 Comments
Show NoneHide None

See Also

Categories

Tags

Products

Community Treasure Hunt

Function approximation: Neural network great 'on paper' but when simulated results are very bad?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

12 Comments Show 10 older commentsHide 10 older comments

More Answers (1)

2 Comments Show NoneHide None

See Also

Categories

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

12 Comments
Show 10 older commentsHide 10 older comments

2 Comments
Show NoneHide None