Probabilistic analysis - failure frequencyseems unreasonable

Im testing/learning new functions/tools in the area mentioned using a simple example.
A bar with normaldistributed Area (mean=0.1, std=0.01) and Yield strength (mean=1100, std=50). If I apply a force equal to mean strength (0.1 x 1100) and apply a failure criteria fail=fail+1 if strength<force and run a number of tests, I would expect fail/tests to be slightly below 0.5. My reason for this is that if strength=force or strength>force the bar does not fail. My results turn out to be the opposite, mostly above 0.5.
Is my script wrong or did I miss something else?
Peter
My test script:
clear
force=110;
fail=0;
tests=1e5;
Area=makedist('norm',0.1,0.01);
Yield=makedist('norm',1100,50);
for i=1:tests,
RandomArea(i)=random(Area);
RandomYield(i)=random(Yield);
RandomStrength(i)=RandomArea(i)*RandomYield(i);
if RandomStrength(i)<force
fail=fail+1;
end
end
failure_frequency=fail/tests

 Accepted Answer

dpb
dpb on 14 Sep 2017
Edited: dpb on 14 Sep 2017
The product of two Gaussian random variables is a linear combination of two Chi-square random variables:
XY=[(X+Y)/2]^2 [(X-Y)/2]^2
The sums X+Y and X−Y are normal so that (X+Y)^2 and (X−Y)^2 are each chi-square w/ one DOF.
Chi-square is skewed to the right, the more so with fewer DOF -- in this case with the variance of the area distribution being so small the departure from normality/skewness isn't large but is enough that the higher probability to the right is observable from your empirical estimates.
Another way to think of it is that since if
X,Y ~ N(mu,sig) and
P = XY
then if Y=X,
P = X^2.
Since P can't be negative, it can't be normal. As your simulation shows, it can be close, but it isn't exactly so.
BTW, "the Matlab way" to write such a simulation would look something like
N=1E5; % number of trials
F=110; % treshold value
A=randn(N,1)*0.01+0.1; % area sample
Y=randn(N,1)*50+1100; % yield sample
R=A.*Y; % compute ratio
frac=sum(R<F)/N; % failure fraction
frac =
0.5112
using the vectorization available which trades memory for the succinctness in expression (and generally, faster code).

More Answers (2)

Hi Peter,
For a random variable situation like this one, the probability that strength = force exactly is essentially zero. Roughly speaking, a double precision random variable has about 15 decimal places, which means 10^15 possible outcomes, all of which are equally likely. So the odds of picking, say, .500000000000000 is essentially zero. That leaves the not unreasonable idea that the failure rate should be 1/2. However, you are looking at the product of two random variables and that conclusion is not correct.
As a simple case, suppose the two random variables each have a mean of 1 and have small variations about those. (Your case is similar in that your two variables have small variations about 0.1 and about 1100). You take the product of the variables and pass the test if the result is >1. Now
1/4 of the time both variables are >1 and you pass.
1/4 of the time both variables are <1 and you fail.
Even Steven so far.
The rest of the time one variable is >1 and the other variable is <1. The most average case one can think of is var1 = 1+a, var2 = 1-a, where a is small since you have small variations about 1. Their product is
(1+a)*(1-a) = 1 - a^2.
The a^2 term is negative, and that biases things a little bit toward failure,as you found experimentally. Letting the two variables vary independently brings in a lot more detail but does not change the basic conclusion.
Great answers both! Many thanks for the "matlab way" example. Some new commands to read about. Bottom line, my statistics skills definitely needs some refreshment.
Peter

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!