- Have your loadPrc return a 4 × 1483 × 2824 numeric matrix (rather than a cell array)
- Your corresponding tall array t will then be 25000 × 1483 × 2824
- Instead of the for loop, simply call prctile in dimension 1

29 views (last 30 days)

I'm trying to calculate a percentile of a lot of files (25000 or even more) containing 4x1 cell, representing 4 maps or 1483x2824 matrixes.

I'm using tall arrays following indications of Percentiles of Tall Matrix Along Different Dimensions:

tic

%start local pool for mutithreading

c=parcluster('local');

c.NumWorkers=20;

parpool(c, c.NumWorkers);

folder='/home/temporal2/dsantos/mat/*.mat'; %more than 25000 files

A=ones(1483,2824,2);%aux matrix for stablish prdtile data type

y=tall(A);

%database of files cointaining 4x1cell of 1483*2824 maps

ds=fileDatastore(folder,'ReadFcn',@loadPrc,'FileExtensions','.mat','UniformRead', true)

t=tall(ds);

%fill the aux tall array with each map in the correct format

for i=1:25000

y(:,:,i)=t(1+(i-1)*1483:1483*i,:);

end

%calculate the percentile

p90_1=prctile(y,90,3)

P90_1=gather(p90_1);

save('/home/temporal2/dsantos/p90_1.mat','P90_1','-v7.3');

toc

But it seems that tall arrays won't work for this because I get the error:

Warning: Error encountered during preview of tall array 'p90_1'. At

tempting to

gather 'p90_1' will probably result in an error. The error encountered was:

Requested 500025x500025 (1862.8GB) array exceeds maximum array size preference.

Creation of arrays greater than this limit may take a long time and cause

MATLAB to become unresponsive. See <a href="matlab: helpview([docroot

'/matlab/helptargets.map'], 'matlab_env_workspace_prefs')">array size limit</a>

or preference panel for more information.

> In tall/display (line 21)

p90_1 =

MxNx... tall array

? ? ? ...

? ? ? ...

? ? ? ...

: : :

: : :

>> Error using digraph/distances (line 72)

Internal problem while evaluating tall expression. The problem was:

Requested 500028x500028 (1862.9GB) array exceeds maximum array size preference.

Creation of arrays greater than this limit may take a long time and cause

MATLAB to become unresponsive. See <a href="matlab: helpview([docroot

'/matlab/helptargets.map'], 'matlab_env_workspace_prefs')">array size limit</a>

or preference panel for more information.

Error in

matlab.bigdata.internal.lazyeval.LazyPartitionedArray>iGenerateMetadata (line

756)

allDistances = distances(cg.Graph);

Error in

matlab.bigdata.internal.lazyeval.LazyPartitionedArray>iGenerateMetadataFillingPart

itionedArrays

(line 739)

[metadatas, partitionedArrays] = iGenerateMetadata(inputArrays,

executorToConsider);

Error in ...

Error in tall/gather (line 50)

[varargout{:}] = iGather(varargin{:});

Caused by:

Error using matlab.internal.graph.MLDigraph/bfsAllShortestPaths

Requested 500028x500028 (1862.9GB) array exceeds maximum array size

preference. Creation of arrays greater than this limit may take a long time

and cause MATLAB to become unresponsive. See <a href="matlab:

helpview([docroot '/matlab/helptargets.map'],

'matlab_env_workspace_prefs')">array size limit</a> or preference panel for

more information.

Any clue on how to solve this problem?

All the best

Edric Ellis
on 13 Aug 2019

That particular error is an internal error basically because your tall array expression is simply too large - contains too many expressions. tall arrays operate by building up a symbolic representation of all the expressions you've evaluated, and then running them all together when you call gather. Because you've got a for loop over 25000 elements, this symbolic representation is large - too large to be evaluated. tall arrays are basically not designed to be looped over in this way. Instead, you need to express your program in terms of a smaller number of vectorised operations.

I would proceed in the following manner (I can't be more specific since your problem statement isn't executable - see this page on tips regarding making a minimal reproduction):

- Have your loadPrc return a 4 × 1483 × 2824 numeric matrix (rather than a cell array)
- Your corresponding tall array t will then be 25000 × 1483 × 2824
- Instead of the for loop, simply call prctile in dimension 1

ds = fileDatastore();

t = tall(ds);

p90_1=prctile(t,90,1);

P90_1=gather(p90_1);

% and then perhaps

P90_1 = shiftdim(P90_1, 1)

Sign in to comment.

David Santos
on 13 Aug 2019

Edric Ellis
on 14 Aug 2019

Ah, sorry, I hadn't realised that prctile in the tall dimension supports only vectors. Hm, this might turn out to be trickier than I thought. In fact, I'm not sure I know how to do this using tall arrays.

Let me just confirm that I got the basics of your problem correct - you do want to compute percentiles individually for each 1483x2824 element - so 4187992 percentiles down vectors of length 25000.

It may be that tall arrays aren't the right tool in this case - at the very least, I think it will be necessary to "transpose" the data so that you can load a handful of 25000-element vectors in memory at a time and call prctile on those in sequence (perhaps even in parallel if you have Parallel Computing Toolbox).

Sign in to comment.

Sign in to answer this question.

Opportunities for recent engineering grads.

Apply Today
## 0 Comments

Sign in to comment.