Forming a CDF by adding different types of PDF and generating random number

Hello,
I have read certain threads on generating random numbers and developing distributions functions in MATLAB. I could not get the answer to my doubt and also guess I am not aware of all the functions and capabilitites of MATLAB still.
I am trying the develop a CDF from multiple PDFs (of different variables). I am able to do that. But the next part- generating random number from this custom CDF is very tricky and creating nuisance. Can you please tell me- if I can know the type of distribution of this custom CDF (without using Distribution Fitter toolbox,if possible- as this is in the middle of the program, running multiple times)??
I want to fit this CDF to suitable PDF and then generate the random number from that PDF. Please guide.
(Please let me know if I am missing on something.)
Thank you in advance for your time.

 Accepted Answer

It seems like there are several different questions here.
  1. If you want to generate random numbers from a custom CDF, one simple procedure is to generate uniform(0,1) random numbers with u=rand, and then find the X value whose CDF equals u within your custom CDF.
  2. Can you know the type of distribution of your custom CDF? Well, not necessarily, no, because your custom CDF does not have to match that of any known distribution. You can try different known distributions and see how well their CDFs match yours, but there might not be any with a good match.
  3. Fitting the CDF to a suitable PDF...I'm not sure exactly what you have in mind here--maybe this is the same as #2. But if you have the custom CDF then you can simply compute the corresponding custom PDF by numerical differentiation. But that won't help you generate random numbers--for that purpose, the CDF format is more convenient (see #1).
Hope that helps,

14 Comments

Thank you for your response Jeff.
I am still facing some problems.
  1. If I use uniform random numbers unifrnd, then I do not get the exact numbers which lie in my custom CDF. Then I can't locate their positions on X axis. (e.g. say unifrnd give 0.0354 while the CDF may not have this number exactly)
  2. Also, if I use Uniform distribution for generating random numbers, then what is the use of the CDF that I have developed? The cum. probability of occurance of the given variable (X) is not considered then.
  3. As functions like logrnd and normrnd can give me random numbers for given type of distribution, I was hoping for similar way of establishing random number and that is why I was trying to find simple steps to fit the custom CDF to some known PDF.
I may try to compare the random number generated from unifrom distribution as in #1 (my comment) close to those in my CDF with reduced accuracy, but as I explained above, it will be not be the best alternative. I have also tried the numerical differentiation of my custom CDF to get the PDF and then use it as discrete probability distribution to generate random numbers with given probability using randsrc function. But it fails still- as the PDF and probability are not same.
@Rohit Mangalekar Is your custom probability distribution discrete? That is, do you have a discrete set of possible X values, each of which occurs with some probability p(X), with the sum of p(X) equal to 1?
Yes, I found the discrete set of cumulative probabilities of all values of variable X (CDF obtained from distributions such as Lognormal, normal, uniform etc.). I have converted this CDF to PDF using simple numerical differentiation i.e. diff function adding same first element to PDF as that of CDF.
But when I run the program multiple times, as I need to do, the sum of probabilities from this discrete PDF is not equal to 1 (shown as error) for some runs, and randsrc function also fails.
Your description of converting the CDF to PDF using diff makes sense to me, but I don't follow the next part. Can you give an example showing how you got a PDF that does not sum to 1?
Following is the part of the program that I am trying to develop.
Fwb=10;
timestep=0:0.00027778:24;
pdWT=makedist ('Logistic','mu',7.0377,'sigma',0.5155);
pdBT=makedist ('Logistic','mu',9.0765,'sigma',0.33442);
pdDT=makedist ('Logistic','mu',21.0365,'sigma',0.3809);
pdID=makedist ('Lognormal', 'mu', 2.0206,'sigma',0.1507);
y1=cdf (pdWT,timestep);
y2=cdf (pdBT,timestep);
y3=cdf (pdDT,timestep);
y4=cdf (pdID,timestep);
y=(y1+y2+y3+y4)/4; % Adding up CDF's to get the resulting custom CDF
randy = randsample(unique(y),Fwb); % Generting 'Fwb' random numbers between 0 to 1 from y(custom CDF) without replacement.
WBId= find (ismember(unique(y),[randy(1:1:Fwb)])); % Finding the location corresponding to these random numbers.
I have used unique to avoid the repeated random numbers or their locations. This repeatation of numbers or their locations on X axis is the main problem I am facing while generating random numbers. Because of this the final output WBId is sometimes turning into some bizzare form like elements less than 0.
Expected output is unique set of numbers from 0 to 86400 for WBId.
I think I understand the problem better now. See if this is what you want:
Fwb=10;
timestep=0:0.00027778:24;
pdWT=makedist ('Logistic','mu',7.0377,'sigma',0.5155);
pdBT=makedist ('Logistic','mu',9.0765,'sigma',0.33442);
pdDT=makedist ('Logistic','mu',21.0365,'sigma',0.3809);
pdID=makedist ('Lognormal', 'mu', 2.0206,'sigma',0.1507);
y1=cdf (pdWT,timestep);
y2=cdf (pdBT,timestep);
y3=cdf (pdDT,timestep);
y4=cdf (pdID,timestep);
y=(y1+y2+y3+y4)/4; % Adding up CDF's to get the resulting custom CDF
% The code seems fine to here.
% Here is a plot of the CDF of your custom distribution:
plot(timestep,y)
xlabel('TimeStep')
ylabel('CDF')
% The next 2 lineare not right for random sampling of timestep values, though,
% because all of the different unique y values are not equally likely.
%randy = randsample(unique(y),Fwb); % Generting 'Fwb' random numbers between 0 to 1 from y(custom CDF) without replacement.
%WBId= find (ismember(unique(y),[randy(1:1:Fwb)])); % Finding the location corresponding to these random numbers
% I would generate random timestep values like this, but there
% is probably a more MATLAB-appropriate way to do it.
% (I am using a while loop to make sure the x values are unique within each set.)
y = [y 1]; % make sure the CDF reaches 1
timestep = [timestep timestep(end)-timestep(end-1)]; % and add a limiting timestep
found = false;
while ~found
u = rand(Fwb,1); % generate random CDF values
xpos = zeros(Fwb,1);
for i=1:Fwb
xpos(i) = find(u(i)<y,1); % look up the timestep values with those CDFs
end
x = timestep(xpos); % x now has Fwb random time steps with probabilities determined by y
found = numel(unique(x)) == Fwb; % this set of x is ok if all the x's are unique
end
Thank you very much @Jeff Miller. This works nice. But if it is possible, can you please ellaborate few things (pardon my poor understanding of sytax and logic both).
  1. The number of elements in y and timestep are increasing by y=[y 1] and....next line. Why this has not affected the output?
  2. Why u(i) <y.
Thank you once again for your valueble time and help.
I am glad this is working for you. Here are attempts to answer your questions:
  1. The y vector holds the cumulative probability for each timestamp value, so it must reach 1. The y vector computed by your method only went up to (I think 0.999), so it did not quite cover the full range of possibilities. That is, your vector left 0.001 of probability unaccounted for, so it did not quite define the full CDF. Increasing the number of elements in these two vectors makes y into a complete CDF, although my choice of the largest timestamp was arbitrary.
  2. Each u(i) value is in the 0-1 range, for example, say 0.43. That corresponds to the timestamp value whose CDF y is just slightly greater than 0.43. So, we find the first position in the y array with a cdf of at least 0.43, and we take the timestamp from that position as the random number. This is a standard method of generating discrete random variables from an arbitrary distribution. To see more clearly how it works, you might think about what happens in a much simpler case, e.g. x's of 10, 20, and 30 with cumulative probabilities of 0.25, 0.7, and 1. The u value will be <.25 25% of the time so the random score will be 10, and so on.
If you don't mind me asking, what is the motivation for generating a new CDF as the sum of other CDFs?
@Jeff Miller Thank you very much. The second point is explained very well. That helped a lot in understanding how I was wrongly intrepreting the CDF. Thanks again.
@Paul. I have certain variables which I am combining into one by summing up their PDFs. These are the variables affecting a particular parameter.So, first I got the CDF's of all those variables and then added them to get the resulting CDF. I wanted to connect those variables through their PDF/CDFs.
Hello Jeff. I am facing some problem again with the logic you suggested.
In the following loop: when it looks up for the value in "y" that is just more than that in "u", the program is giving following error:
Unable to perform assignment because the left and right sides have a different number of elements.
When I check the lengths, both WKSID and u have same legnths. I thought it is because of u as column vector and y as a row vector. Then I changed y to column vector. Still the problem persists. Can you please guide?
found=false;
while ~found
u = rand(Fwks,1); % generate random CDF values
WKSId = zeros(Fwks,1);
for i=1:Fwks
WKSId(i) = find(u(i)<y,1); % look up the timestep values with those CDFs
end
xtime = timestep(WKSId); % x now has Fwb random time steps with probabilities determined by y
found = numel(unique(xtime)) == Fwks; % this set of x is ok if all the x's are unique
end
Hi Rohit,
I am not sure, but my guess is that your y array does not go all the way to 1 (or 1+eps(1), to be sure).
The only cases I see where this logic runs into problems are when u(i)>=max(y), so you must make sure that does not happen. And be careful--MATLAB may print '1' for max(y) even when the number is actually a little smaller than 1.
Hello Jeff, Thank you very much once again. This is absolutely correct reason for the error. My "y" (the array I am using in that loop, a CDF) was not approaching 1. That was the reason the values in "u" like 0.96 were not matching with any of the values in "y".
I am very thankful for your time and your such a nice explainations. Hope to explore such intricacies of MATLAB and do better programming.
You are very welcome. I am glad my comments were helpful.

Sign in to comment.

More Answers (0)

Products

Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!