Speed up nested loops with parfor

I'm trying to speed up this part of code. I have a constraint: the inputs of ReturnFn must all be scalars. If it was not for this restriction, I could easily vectorize the code. So I would like to know if there is a way to make the code below faster, while still satisfying this restriction of the inputs of ReturnFn.
Any help is really appreciated!
N_d = 50;
N_a = 300;
N_z = 10;
% ParamCell contains: K_to_L,alpha,delta,pen,gamma,crra
% I need the cell array to handle variable number of inputs in ReturnFn
Fmatrix=zeros(N_d*N_a,N_a,N_z);
parfor i4i5=1:N_z
Fmatrix_z=zeros(N_d*N_a,N_a);
for i3=1:N_a % a today
for i2=1:N_a % a' tomorrow
for i1=1:N_d % d choice
Fmatrix_z(i1+(i2-1)*N_d,i3)=ReturnFn(d_gridvals(i1),a_gridvals(i2),a_gridvals(i3),z_gridvals(i4i5,1),z_gridvals(i4i5,2),ParamCell{:});
end
end
end
Fmatrix(:,:,i4i5)=Fmatrix_z;
end
function F = f_ReturnFn(d,aprime,a,e,age,K_to_L,alpha,delta,pen,gamma,crra)
% INPUTS (always 5 inputs, plus some extra parameter inputs)
% d: Hours worked
% aprime: Next-period's assets
% a: Current period assets
% e: Labor efficiency shock
% age: Age of individual: young or old
% TOOLKIT NOTATION
% (d,aprime,a,z), where z = [e;age]
F = -inf;
r = alpha*K_to_L^(alpha-1)-delta;
w = (1-alpha)*K_to_L^alpha;
income = (w*e*d)*(age==1)+pen*(age==2)+r*a;
c = income+a-aprime; % Budget Constraint
if c>0
% NOTE: 0<d<1 is already built into the grid
% WARNING: this will not work if crra=1
inside = (c^gamma)*((1-d)^(1-gamma));
F = inside^(1-crra)/(1-crra);
end
end

7 Comments

I have a constraint: the inputs of ReturnFn must all be scalars.
Who imposed that constraint? The code inside f_ReturnFn can be vectorized.
This code is part of toolkit that runs in two different ways depending on whether the user's computer has a NVIDIA gpu or not. With gpu what we do is the following and is fast. But what if the user's computer does not have a supported gpu? On the cpu the best would be to vectorize f_ReturnFn, but this would require to keep two different versions of f_ReturnFn, one with scalar inputs and another one vectorized.
Please let me know if my explanation is not clear enough
ReturnFnParamsVec = [5.0,0.3,0.1,0.1,2.0,2.0];
ParamCell=cell(length(ReturnFnParamsVec),1);
for ii=1:length(ReturnFnParamsVec)
ParamCell(ii,1)={ReturnFnParamsVec(ii)};
end
N_a = 100;
N_z = 50;
a_grid = linspace(0,50,N_a)';
z_grid = linspace(0.5,1.5,N_z)';
agrid_vals1 = a_grid; % column vector (N_a,1,1)
agrid_vals2 = shiftdim(a_grid,-1); % row vector (1,N_a,1)
z_grid_vals3 = shiftdim(z_grid,-2); % (1,1,N_z)
% IMPORTANT: arrayfun on the GPU uses automatic expansion
% arrayfun on the cpu does not and is too slow in practice
% Here it gives an error because it is trying to use the GPU but mastlab
% online does not have access to parallel computing toolbox
Fmatrix = arrayfun(@f_ReturnFn,agrid_vals1,agrid_vals2,z_grid_vals3,ParamCell{:});
Error using arrayfun
All of the input arguments must be of the same size and shape.
Previous inputs had size 100 in dimension 1. Input #3 has size 1
function F = f_ReturnFn(aprime,a,z,K_to_L,alpha,delta,pen,gamma,crra)
% INPUTS (always 5 inputs, plus some extra parameter inputs)
% d: Hours worked
% aprime: Next-period's assets
% a: Current period assets
% e: Labor efficiency shock
% age: Age of individual: young or old
% TOOLKIT NOTATION
% (d,aprime,a,z), where z = [e;age]
F = -inf;
r = alpha*K_to_L^(alpha-1)-delta;
w = (1-alpha)*K_to_L^alpha;
income = w*z+r*a;
c = income+a-aprime; % Budget Constraint
if c>0
% WARNING: this will not work if crra=1
F = c^(1-crra)/(1-crra);
end
end %end function "f_ReturnFn"
"...but this would require to keep two different versions of f_ReturnFn, one with scalar inputs and another one vectorized."
Why would it require two versions? A fully vectorized function will work perfectly on scalar inputs too.
@Stephen23: but not with arrayfun on the gpu. The vectorized version would require e.g.
F(c<=0)=-inf
and this does not work with arrayfun on gpu arrays
Hi @Alessandro Maria Marco,
My recommendation would be refactoring `f_ReturnFn` so that it can handle both scalar and vector inputs robustly without needing separate implementations. This might include using more generic array handling techniques or making sure that logical operations are compatible with GPU execution. Also, try exploring MATLAB's capabilities in parallel computing further or consider alternative approaches like utilizing MATLAB's built-in functions that support GPU arrays more effectively.
That's a good suggestion. Do you have something specific in mind? Like putting an if branch and using isscalar on the input?

Hi @Alessandro Maria Marco,

Below is a refactored version of your function that incorporates these suggestions:

       function F = f_ReturnFn(aprime, a, z, K_to_L, alpha, delta, pen,        gamma, crra)
       % Check if inputs are scalar or vector
       isScalar = @(x) isscalar(x);
       % Initialize F with -inf for all entries
       F = -inf(size(aprime));  % Make sure F has the same size as aprime
       % Calculate common parameters
       r = alpha * K_to_L^(alpha - 1) - delta;
       w = (1 - alpha) * K_to_L^alpha;
       income = w .* z + r .* a;  % Element-wise multiplication
       % Budget constraint calculation
       % Ensure this handles array operations correctly
       c = income + a - aprime;  
       % Compute utility only where c > 0
       validIndices = c > 0;  % Logical array indicating valid consumption
       if any(validIndices)
           % Use element-wise operations for valid indices
           F(validIndices) = c(validIndices).^(1 - crra) / (1 - crra);
           if crra == 1
               % Handle CRRA=1 case separately
               F(validIndices) = log(c(validIndices));  
           end
           end
           end

Explanation of Changes: The modify code now incorporates the use of `.*` so that multiplications are applied element-wise across arrays.By creating a logical index array (`validIndices`), conditions can be applied directly to the output array `F`, allowing for efficient computation without separate handling for scalars and vectors and special cases like `crra = 1` are handled separately within the same function scope. Also, if you are interested, MATLAB provides functions such as `gpuArray` that can convert regular arrays into GPU-compatible arrays easily. Utilizing these functions in your code will optimize performance further. Fore more information on this function, please refer to

https://www.mathworks.com/help/parallel-computing/gpuarray.html

Please let me know if you have any further questions.

Sign in to comment.

Answers (0)

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Products

Release

R2024a

Commented:

on 12 Aug 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!