Probablity of outputs of binary classification in matlab
2 views (last 30 days)
Show older comments
Hi
I have a binary classification problem and using neural network and SVM for it. So I choose a threshold (For instance 0.5) for output of neural network. If output is greater than 0.5 it belongs to class 1 and if it is smaller than 0.5 it belongs to class2. After training the network, for out sample data how can I calculate probability of outputs? For out sample data I use same criteria (0.5) and find the class out new these new data? Can we say if output of neural network is 1, the probability of belonging to class 1 (greater than 1 is class 1) is higher than for instance 0.55 ? ( I used tansig transfer function for output layer of neural network).
Another question: How can I find probability (or possibility) of belonging to each class in SVM?
I want find how much an out sample belong to a specific class. Can I do that? IS any function in Matlab for it?
Thanks.
0 Comments
Accepted Answer
Greg Heath
on 23 Apr 2014
Edited: Greg Heath
on 25 Apr 2014
If you use columns of eye(2) for targets, the outputs will be consistent (i.e., as N-> inf) estimates of the input-conditional posterior class probabilities provided the correct objective function is used.
Typically purelin, logsig and softmax are used as output transfer functions. Although MSE is reasonablr for the first two, crossentropy should be used for the latter.
In spite of being consistent estimates purelin does not enforce [0,1] and logsig does not enforce sum(estimates) = 1.
The conversion between the probability targets and estimates and the class indices is obtained using the functions ind2vec and vec2ind.
help ind2vec, doc ind2vec
nelp vec2ind, doc vec2ind
Hope this helps.
Thank you for formally accepting my answer
Greg
PS search using
greg patternnet vec2ind
greg patternnet ind2vec
2 Comments
Greg Heath
on 8 May 2014
>Can I use only ‘softmax’ for output layer? For instance ‘softmax’ for output layer, ‘tansig’ for hidden layer, ‘mse’ for performance measurement and ‘trainlm’ for training function because ‘trainlm’ has best classification accuracy in my case and I can’t use it with cross entropy.
>You said "In spite of being consistent estimates ‘purelin’ does not enforce [0,1] and ‘logsig’ does not enforce sum(estimates) = 1." so by using ‘logsig’ or ‘purelin’ we can’t obtain probabilities. Is this true? For example by using “MSE”, “Trainlm” and logsig for output in a binary classification problem I can have 0.7 and 1.3 for outputs! Is this true? How can I describe these numbers?
Why would you want to use trainlm? trainscg is the default for patternnet.
Logsig cannot yield 1.3.
Purelin can yield 1.3, BUT NOT 1.3 AND 0.7.
I did not say XENT/SOFTMAX "is the only way"
I did not say outputs "ARE probabilities"
I did say outputs will be CONSISTENT ESTIMATES
The MOST INPORTANT THING is that the correct class corresponds to the largest output. The relative values reflect the confidence in the estimate.
The following is only a rough remembrance from more precise posts of mine in comp.ai.neural-nets and comp.soft-sys.matlab. They can be found using the search keywords
greg softmax
If you use columns of eye(c) for targets of c classes and MSE, XENT1 (non-mutually exclusive classes) or XENT2 (mutually exclusive classes) as the minimization objective function, the outputs y(i), 1<=i<=c, will be CONSISTENT (i.e., as N-> inf) ESTIMATES of the input-conditional posterior class probabilities P(i)*p(i|x), 1<= i <= c.
There are 3 traditional canonical objective-function/transfer-function pairs with the following properties at the objective function minimum
1. MSE/PURELIN sum(outputs) = 1
2. XENT1/LOGSIG 0 < outputs < 1
3. XENT2/SOFTMAX 0 < outputs < 1, sum(outputs) = 1
Using the canonical pairs leads to simple expressions for the objective function derivatives.
If multiple classes are mutually exclusive, softmax is the most reasonable choice because the estimates always sum to 1. However, often logsig is used and after convergence, the outputs are divided by the sum. I don't recall the latter as being that reliable.
Unfortunately, classes with many more training examples than the others will adversely effect results. Therefore some correction methods discussed in previous posts sould be considered. Search using
greg unbalanced (yea, I know)
More Answers (0)
See Also
Categories
Find more on Image Data Workflows in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!