parfor variable classification issue revisited

3 views (last 30 days)
I have a million (literally) text files that I need to read a number from. I currently do this in a nested loop as such:
len_A = 5;
len_B = 6;
len_C = 7;
len_D = 8;
len_E = 9;
output = zeros(prod([len_A, len_B, len_C, len_D, len_E]), 6);
for ind_A = 1 : len_A
for ind_B = 1 : len_B
for ind_C = 1 : len_C
for ind_D = 1 : len_D
for ind_E = 1 : len_E
line_num = sub2ind([len_E, len_D, len_C, len_B, len_A], ind_E, ind_D, ind_C, ind_B, ind_A);
% Real Script
% open a file from the disk, read in a number
% output_temp(count, :) = [line_num, ind_A, ind_B, ind_C, ind_D, ind_E, the number from line above];
% Example Script
output(line_num, 1:6) = [line_num ind_A, ind_B, ind_C, ind_D, ind_E];
end
end
end
end
end
This is time intensive. Since my disk and processor are not maxed out, I wanted to do this in parallel and speed it up. Based on: https://www.mathworks.com/matlabcentral/answers/838625-parfor-variable-classification-issue, I tried:
output = zeros(prod([5, 6, 7, 8, 9]), 6);
% output = zeros(1, 7);
parfor ind_A = 1 : 5
output_temp = zeros(prod([6, 7, 8, 9]), 6);
count = 0;
for ind_B = 1 : 6
for ind_C = 1 : 7
for ind_D = 1 : 8
for ind_E = 1 : 9
count = count + 1;
line_num = sub2ind([9, 8, 7, 6, 5], ind_E, ind_D, ind_C, ind_B, ind_A);
% Real Script
% open a file from the disk, read in a number
% output_temp(count, :) = [line_num, ind_A, ind_B, ind_C, ind_D, ind_E, the number from line above];
% Example Script
output_temp(count, 1:6) = [line_num, ind_A, ind_B, ind_C, ind_D, ind_E];
end
end
end
end
max_line_num = sub2ind([9, 8, 7, 6, 5], 9, 8, 7, 6, ind_A);
min_line_num = max_line_num - prod([9, 8, 7, 6, 1]) + 1;
output(min_line_num : max_line_num, :) = output_temp;
end
I am unable to figure out how to make this work. I would truly appreciate any help you could provide.

Accepted Answer

Walter Roberson
Walter Roberson on 11 Aug 2023
Clear a multidimensional array. parfor along one of the dimensions, preferably the last.
Within the parfor loop, use nested for loops and multidimensional indexing to assign values to a temporary array that is the right size except for being length 1 along the dimension you are parfor over. After you have assigned all the values to the temporary array,
output(:,:,:,:,INDEX, :) = output_temp;
If you need to, then after the parfor loop, reshape() to collapse those other dimensions.
It is important that the only place you write into the output variable, that the indices be one of ":", or an expression that is constant throughout the parfor, or a linear transform of the parfor variable. Using a computed range like you are doing is Not Permitted.
  2 Comments
Craig
Craig on 18 Aug 2023
Edited: Craig on 18 Aug 2023
By following Walter's suggestions, and after some work such as changing the parfor from Walter's recommendation of the last index to the first, this is what I finally got to work for me:
len_A = 5;
len_B = 6;
len_C = 7;
len_D = 8;
len_E = 9;
output = zeros(len_A, len_B, len_C, len_D, len_E, 6);
parfor ind_A = 1 : len_A
output_temp = zeros(len_B, len_C, len_D, len_E, 6);
for ind_B = 1 : len_B
for ind_C = 1 : len_C
for ind_D = 1 : len_D
for ind_E = 1 : len_E
line_num = sub2ind([len_E, len_D, len_C, len_B, len_A], ind_E, ind_D, ind_C, ind_B, ind_A);
% Real Script
% open a file from the disk, read in a number
% output_temp(count, :) = [line_num, ind_A, ind_B, ind_C, ind_D, ind_E, the number from line above];
% Example Script
output_temp(ind_B, ind_C, ind_D, ind_E, 1:6) = [line_num ind_A, ind_B, ind_C, ind_D, ind_E];
end
end
end
end
output(ind_A, :, :, :, :, :) = output_temp;
end
output = reshape(output, prod([len_A, len_B, len_C, len_D, len_E]), 6);
output = sortrows(output, 1);
Walter Roberson
Walter Roberson on 18 Aug 2023
The reason I suggested parfor over the last dimension instead of the first, is that the way multidimensional arrays are stored, the any leading : dimensions are stored in consecutive memory -- so if you had A(:,:,idx) then A(1:end,1:end,idx) would be stored in consecutive memory. But if you had A(idx,:,:) then each piece of data would be size(A,1) apart from each other in memory, which is not as efficient to transfer as consecutive memory.

Sign in to comment.

More Answers (1)

Jeff Miller
Jeff Miller on 16 Aug 2023
Edited: Jeff Miller on 18 Aug 2023
Maybe something like this would be helpful, using the wonderful allcomb.
idx = allcomb(1:5,1:6,1:7,1:8,1:9);
nrows = size(idx,1);
output = zeros(nrows,6);
parfor ind_row = 1:nrows
idx_A = idx(ind_row,1);
idx_B = idx(ind_row,2);
idx_C = idx(ind_row,3);
idx_D = idx(ind_row,4);
idx_E = idx(ind_row,5);
result = yourActualFn(idx_A,idx_B,idx_C,idx_D,idx_E);
output(ind_row,:) = [idx(1:5), result];
end
  2 Comments
Craig
Craig on 18 Aug 2023
Thanks for the reply Jeff. This might allow the calculation of the "line_num", but I don't see how it would allow me to do all the other work in the real script.
Jeff Miller
Jeff Miller on 18 Aug 2023
@Craig, Glad you got the problem solved.
Just for future reference, I edited the script to make it clearer what I thought you might do. Could be that I don't understand what other work you want to do in the real script, though.

Sign in to comment.

Products


Release

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!