Multivariable probability density estimation in a very large data set

1 view (last 30 days)
I need to estimate the multivariable probability density in a very large data set (~10^10 samples and each sample has two dimensions).
I have tried to use mvksdensity and store the data set as tall array. But I am warned that
Conversion to logical from tall is not possible.
Error in mvksdensity>parse_args (line 123)
if ~ismatrix(yData) || isempty(yData)
Error in mvksdensity (line 85)
support,weight,cens,cutoff,bdycorr,ftype,plottype,isXChunk] =
parse_args(yData,varargin{:});
Error in density_est (line 15)
f = mvksdensity(R,xi,'Bandwidth',0.8,'Kernel','normpdf');
A minimal example will be like this:
mu=[0,0];
sigma=[.1,0;0,.1];
R = mvnrnd(mu,sigma,1000);
R(:,2)=tanh(R(:,2));;
figure;
scatter(R(:,1),R(:,2),'.');
R=tall(R); % Here R does not need tall array. But in the actual case, R will be a large data set, which will take about 100GB memeory.
gridx1 = linspace(-5,5,100);
gridx2 = linspace(-5,5,100);
[X1,X2]=ndgrid(gridx1,gridx2);
X1=X1(:,:)';
X2=X2(:,:)';
xi=[X1(:),X2(:)];
f = mvksdensity(R,xi,'Bandwidth',0.8,'Kernel','normpdf','Support',[-Inf,-1;Inf,1]);
fgrid=reshape(f,[100,100]);
figure;surf(gridx1,gridx2,fgrid,'EdgeColor','none');view(2);

Answers (0)

Categories

Find more on Random Number Generation in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!