How do I convert a non-normal distribution to an equivalent normal distribution?
30 views (last 30 days)
I am working with probability distributions using multivariate equations. At times some of the variables are not normally distributed but in order to work with the equation, I need all of them to be of the same form and the best one is a normal variation. However I do not know how to transform these into equivalent normal forms.
For example, I have 4 variables in an equation as
g = x1 + 2*x2 - x3*x4;
and x1 and x3 are Weibul and lognormal respectively the data of which is randomly generated using
x1 = wblrnd(1207.289, 6.22326, [1, 1e+6]);
x2 = normrand(769, 15, [1, 1e+6]);
x3 = lognrand(32, 4.57, [1, 1e+6]);
x4 = normrand(250, 4, [1, 1e+6]);
I need help in transforming x1 and x3 into normal random variables so that I can work with them
Bruno Luong on 27 May 2022
Edited: Bruno Luong on 27 May 2022
You can almost always map a reasonable continuous random distribution to a normal one. If r follows some distribution law and you know the cdf function, let's call it cdf then
g = erfinv(2*cdf(r)-1)
will follow the normal gaussian distribution.
More Answers (2)
the cyclist on 27 May 2022
A few thoughts that might be useful, if not exactly a complete answer to your question ...
First, I hope it is clear that not all distributions can be transformed to normal. The most obvious example would be a binomial:
p = 0.2;
x = 0:1;
y = binopdf(0:1,1,p);
Second, there are common misconceptions about normality requirements in modeling. For example, it is often the case that only the model residuals need to be normally distributed, and not the variables themselves. It's impossible to go into all the details of this here, but I feel obligated to mention it.
Third, as @KSSV has mentioned, you can use a power transform (e.g. the Box-Cox transform that they mentioned). My understanding is that these transforms won't necessarily make the distribution strictly normal -- just more "normal-like". I'm not sure that's what you are going for, particularly because, for example, your Weibull distribution with those parameters is pretty well approximated by a normal already. How important that difference is will depend on your application.
Fourth, you can transform the log-normal to normal exactly, by applying natural log.
Fifth, you don't say anything about correlations among variables, but that complicates things. There is some good MATLAB documentation about simulating correlated random variables using copulas. That article is heady stuff, but very useful. It also has ideas in there about transforming distributions back and forth from uniform distributions, which may be helpful in your problem in general. Specifically, if there is an inverse Weibull that will transform Weibull to uniform, then you use that, then transform the uniform to normal. I didn't investigate more than that.
I hope that was helpful.