Hi,
I understand that you want to know why some network training functions are performing poorly as compared to others.
The evaluation strategy might not give you the correct picture of how the network performs. As you are training the network on half the points, you cannot use those points in evaluation because network has already "seen" those points. In some network, suppose the network is overfitting on the training dataset, you will get very good results when evaluating. There should be no overlap in the training and testing dataset.
Here is the same example with different training and testing splits.
[x,y] = simplefit_dataset;
nets = {'trainlm','trainbr','trainbfg','traincgf','trainrp'};
net = fitnet(hidsize, nets{i});
net.trainParam.showWindow=false;
net=train(net,train_x, train_y);
perf = perform(net,Y,test_y);
disp(nets{i} + " Mean Squared Error = " + perf);
Your observation regarding the performance of the network with 8 hidden layers is correct. However, if you change the number of hidden layers to 15, for example, you will notice different results.
With deep learning, the goal is to have generalization. So, for comparing training functions, you should perform the comparison on different network sizes to come at a conclusion.