How to calculate the conditional probability of an event?

31 views (last 30 days)
Myriam Moss on 23 Apr 2021
Answered: William on 25 Apr 2021
I have an array similar to this array = [A A B A C A B B B C C A A C]. I want to calculate p(C|A), p(C|B), p(C|C). How can I do this just having this information? I want to know what is the probability of C happening after a previous event A, B or C.

William on 23 Apr 2021
Hello Myriam. It is not clear whether A, B, and C here are text characters, or if they have numeric values. If we assume they have numerical values, like A=1, B=2 and C=3, then you could use
y = diff(array);
P_AC = sum(y==2);
P_BC = sum(y==1);
P_CC = sum(y==0);
Myriam Moss on 24 Apr 2021
Hi William. Thank you. They are characters.
I'm new to matlab sorry. Could you explain to me your logic, please?

William on 25 Apr 2021
Actually, I believe that p(C|A) would be:
y = strfind(array, 'A');
N_A = length(y);
p_CA = N_AC/N_A;
There is one further thing to consider, though. It may be true that the very last element of array is an 'A'. I don't think this should be counted in N_A because we don't know whether it would have been followed by a 'C' or not. So, if the last element of array is 'A', we should reduce N_A by 1.
y = strfind(array, 'A');
N_A = length(y);
if y(end)==length(array) || y(end)==length(array)-1 % The string might end
N_A = N_A - 1; % with an 'A' or an 'A '
end
p_CA = N_AC/N_A;

William on 25 Apr 2021
Myriam,
If A, B and C were variables with the values 1, 2 and 3, then in your example:
array = [1, 1, 2, 1, 3, 1, 2, 2, 2, 3, 3, 1, 1, 3]
The diff() function returns the difference between each value and the next value, so
diff(array) = [0, 1, -2, 2, -2, 1, 0, 0, 1, 0, -2, 0, 2]
Every time an A is followed by a C, the difference is 2. Every time a B is followed by a C, the difference is 1. So, I was suggesting that you count the number of times A is followed by C by counting the number of times that the value 2 appears in diff(array) with a statement like c = sum(diff(array) == 2). Unfortunately, I see now that this does not work correctly for the number of times B is followed by C, because this results in a value of 1 in diff(array), and a value of 1 is also produced when an A is followed by a B.
Since you have said that A, B and C are characters, I assume that you mean that:
array = 'A A B A C A B B B C C A A C';
In this case, maybe a better solution would be:
y = strfind(array, 'A C');
N_AC = length(y);
y = strfind(array, 'B C');
N_BC = length(y);
Myriam Moss on 25 Apr 2021
Thank you William! Now I have the number of times C appears after A and B.
If I define
y = strfind(array, 'C');
N_C = length(y);
If I want p(C|A), for example, I should do:
p_CA = N_AC/N_C , do you agree? :)