What functions is the patternnet for the hidden layer and output layers?

2 views (last 30 days)
Hi, I am trying to learn how a NN works. I created a NN using MatLab patternet to classify XOR. However, when I input manually, it has a different result than the net(input). According to this article, if you use the GUI, it used the sigmoid transfer functions in both the hidden layer and the output layer, bullet 7. And if you used cmd-line, it used the tan-sigmoid transfer functions in both the hidden and output layers, bullet 2. I tried both version and it still give me a different result.Here is my code:
input = [0 0; 0 1; 1 0; 1 1]';
xor = [0 1; 1 0; 1 0; 0 1]';
% Create a larger sample size
input10 = repmat(input,1,10);
xor10 = repmat(xor,1,10);
% MatLab NN
net = patternnet(2);
net = train(net, input10, xor10);
% Get the weights
IW = net.IW;
b = net.b;
LW = net.LW;
IW = [IW{1}'; b{1}'];
LW = [LW{2}'; b{2}'];
%%Using tan-sigmoid
% Input to hidden layer
hid = zeros(2,1);
hidsig = zeros(2,1);
in = input(:,1);
for i = 1:2
hid(i) = dot([in;1],IW(:,i));
hidsig(i) = tansig(hid(i));
end
% Hidden to output layer without normalization
out = zeros(2,1);
outsig = zeros(2,1);
for i = 1:2
out(i) = dot([hidsig;1],LW(:,i));
outsig(i) = tansig(hidsig(i));
end
outsoftmax = softmax(out);
outsoftmaxsig = softmax(outsig);
% Hidden to output layer with normalization
normout = zeros(2,1);
normoutsig = zeros(2,1);
normhidsig = hidsig./norm(hidsig);
for i = 1:2
normout(i) = dot([normhidsig;1],LW(:,i));
normoutsig(i) = tansig(normhidsig(i));
end
normoutsoftmax = softmax(normout);
normoutsoftmaxsig = softmax(normoutsig);
result = net(in);
disp(result);
disp('tan-sigmoid');
disp(outsig);
disp(outsoftmax);
disp(outsoftmaxsig);
disp(normoutsig);
disp(normoutsoftmax);
disp(normoutsoftmaxsig);
%%Using sigmoid
% Input to hidden layer
hid = zeros(2,1);
hidsig = zeros(2,1);
in = input(:,1);
for i = 1:2
hid(i) = dot([in;1],IW(:,i));
hidsig(i) = sigmf(hid(i),[1,0]);
end
% Hidden to output layer without normalization
out = zeros(2,1);
outsig = zeros(2,1);
for i = 1:2
out(i) = dot([hidsig;1],LW(:,i));
outsig(i) = sigmf(hidsig(i),[1,0]);
end
outsoftmax = softmax(out);
outsoftmaxsig = softmax(outsig);
% Hidden to output layer with normalization
normout = zeros(2,1);
normoutsig = zeros(2,1);
normhidsig = hidsig./norm(hidsig);
for i = 1:2
normout(i) = dot([normhidsig;1],LW(:,i));
normoutsig(i) = sigmf(normhidsig(i),[1,0]);
end
normoutsoftmax = softmax(normout);
normoutsoftmaxsig = softmax(normoutsig);
result = net(in);
disp('sigmoid');
disp(outsig);
disp(outsoftmax);
disp(outsoftmaxsig);
disp(normoutsig);
disp(normoutsoftmax);
disp(normoutsoftmaxsig);

Accepted Answer

Greg Heath
Greg Heath on 18 Jan 2015
1. Do not use xor as the name of a variable. It is the name of a function
help xor
doc xor
2.
input = [0 0; 0 1; 1 0; 1 1]';
target = xor(input) % NO SEMICOLON!
3. There is no good reason to add exact duplicates to this training set. In general, however, adding noisy duplicates can help if the net is to be used in an environment of noise, interference and measurement errors.
4. If you want to know what transfer functions are being used, all you have to do is ask
net = patternnet(2) % NO SEMICOLON!
5. Also note that there are default normalizations
Hope this helps.
*Thank you for formally accepting my answer*
Greg
  2 Comments
Timmy
Timmy on 19 Jan 2015
  1. In general programming, using the name of a function as the name of a variable is a big NO-NO. However, in this example, it is the emphasis that variable is XOR. Furthermore, I will not be using the function. I am also using other targets in the program, such as AND and OR, so dont want to use the name target.
  2. Two things. First, the function xor takes two parameters and give a logical output, which is not what I want. I want a 2-output nodes system, therefore I make them as they were. Second the semicolon is to suppress the display that will generate by the equal sign, as I dont need it to display.
  3. The reason behind adding duplicates is to generate enough samples for the NN to train on. By default, the NN is to train on 70%, validate on 15% and test on 15%. If I have just four inputs, two will be used to train, one will validate and one will be test, which will be inconclusive.
  4. I am not trying to find out the transfer functions, but rather how it is used to classify the input.
  5. I know there is normalizations but I dont know where it is applying to. As you can see in my code, I try with and without normalization.
Greg Heath
Greg Heath on 20 Jan 2015
1.In general programming, using the name of a function as the name of a variable is a big NO-NO. However, in this example, it is the emphasis that variable is XOR. Furthermore, I will not be using the function. I am also using other targets in the program, such as AND and OR, so dont want to use the name target.
Ridiculous answer. Write code so that others can follow it easily.
2.Two things. First, the function xor takes two parameters and give a logical output, which is not what I want. I want a 2-output nodes system, therefore I make them as they were. Second the semicolon is to suppress the display that will generate by the equal sign, as I dont need it to display.
The purpose of NO SEMICOLON is to verify, DURING DESIGN that the output is what you think it should be.
In this case it will also answer your question as to what transfer functions are used.
The finished design will have the semicolons.
3.The reason behind adding duplicates is to generate enough samples for the NN to train on. By default, the NN is to train on 70%, validate on 15% and test on 15%. If I have just four inputs, two will be used to train, one will validate and one will be test, which will be inconclusive.
Exact duplicates add nothing to the design except reduce the number of epochs. The purpose of validation and test data is to make sure the design works on nontraining data. However, your nontraining data contains exactly the same vectors. Therefore, it is a waste of time.
Typically, what is done is to add noise to the training and validation duplicates. Then test on the noisefree original data.
4.I am not trying to find out the transfer functions, but rather how it is used to classify the input.
Targets are 0 or 1 and can be obtained from class indices via the function ind2vec
With mutually exclusive classes, the suggested output transfer function is softmax because the outputs are supposed to be consistent unit-sum input-conditional posterior probability estimates in [0,1];
The maximum output indicates the class and is determined using the function vec2ind.
With non-mutually exclusive classes (e.g., tall, dark, handsome), unit sum is not applicable and logsig is the appropriate output transfer function.
The classes are determined by thresholds determined by validation data.
5.I know there is normalizations but I dont know where it is applying to. As you can see in my code, I try with and without normalization
I'll check. I cannot use your code.
Greg

Sign in to comment.

More Answers (0)

Categories

Find more on Image Data Workflows in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!