MATLAB Answers

can someone suggest to make this code faster

1 view (last 30 days)
nlm
nlm on 10 Mar 2020
Edited: Guillaume on 13 Mar 2020
Can someone suggest me a trick to make this code faster.
"all_DOY" = 1000*15000*23 ; %(i,j,k) such that k dimension is the day of the year whic is random, while (i,j) are my pixel location starting from (1,1) to (1000,15000). The values of day of the year (DOY) in the kth dimension rranges range from -1 to 366
"all_variable" = 1000*15000*23; This is my variable value for (i,j,k) for the (i,j) position and k dimension i.e., day of the year in the "all_DOY" matrix. The values in the kth dimension range from -3000 to 15000
all_variable_new = 1000*15000*366
I'm trying to update the "all_variable_new" such that, my k dimension is from 1:366 (days of the year). for each day of the year obtained from "all_DOY" matrix, the variable value is updated in the "all_variable_new" for that respective (i,J,k) from "all_variable"
all_variable_new=zeros(1000,15000,366,'int16');
all_variable = load('all_variable.mat');
all_DOY = load('all_DOY.mat');
tic
for ii =1: size(all_DOY,1) % for each ii
for jj = 1:size(all_DOY,2) % for each jj
idx = squeeze(all_DOY(ii,jj,:)); % sequeez k'th dimension. idk is the DOY which is also the position value in all_variable_new
variable = squeeze(all_variable(ii,jj,:)); % obtain the variable values
variable(idx==-1)=[]; % if -1, remove
idx(idx==-1)=[]; % if -1 then remove the values
Ind=idx;
if isempty(Ind)
all_variable_new(ii,jj,:) = NaN; % if there are no idx values then replace with NAN
else
for mm = Ind
all_variable_new(ii,jj,mm) = variable; % for the day of the year values obtained, replace those positions with variable values.
end
end
clear idx Ind mm
end
end
toc

  4 Comments

Show 1 older comment
nlm
nlm on 10 Mar 2020
I updated the code with comments. let me know if you need more details.
Guillaume
Guillaume on 10 Mar 2020
Writing good comments is hard. Unfortunately your comments are not that useful, they mostly paraphrase the code but don't explain the purpose of the code, so we still have no idea what your code is doing. It's not helped by the poor choice of variable names. A good variable name describes the purpose of the variable. Calling a variable variable tells you nothing about its purpose.
We need to know:
  • What do the inputs represent?
  • It looks like you have 3D variables, what does each dimension represent? and if within one dimension, the rows/columns/pages represent different things, what are these?
  • Why is all_variable of class int16?
  • What is the purpose of the code? What does it try to achieve?
Saying that there are clearly some things that can be improved in your code. It's doubtful they'll have much of an impact though:
  • The clear idx Ind mm is completely pointless. The variables will be replaced anyway on the next step of the loop.
  • The copy of idx into Ind is completely pointless. You may as well continue using idx and never bother creating Ind.
  • The for mm = Ind is completely pointless. Ind is guaranteed to be a column vector, so the loop will only have one iteration with mm exactly equal to Ind, so the mm loop can be altogether replaced by:
all_variable_new(ii,jj,Idx) = variable;
nlm
nlm on 10 Mar 2020
  • What do the inputs represent?
  • It looks like you have 3D variables, what does each dimension represent? and if within one dimension, the rows/columns/pages represent different things, what are these?
"all_DOY" : is a 3D matrix of size 1000*15000*23 where (i,j) represent pixel position, while k represent day of the year (DOY). the values of DOY randomly varies in the range of -1 to 366.
My variable is the temperature data.
"all_variable": is a 3D matrix (i,j,k), where (i,j) represent pixel position, while k represent temperature. Temperature ranges from -3000 to 10000.
"all_variable_new": is a 3D matrix of size 1000*15000*366 and I want to make a 3D matrix (i,j,), where (i,j) represent pixel position, and store the temperature values sequentially according to the DOY values obatined from "all_DOY" .
  • Why is all_variable of class int16?
to reduce the memory of the variable.
  • What is the purpose of the code? What does it try to achieve?
I want to make a "all_variable_new", such that the temperature variable is arranged according to the DOY.

Sign in to comment.

Accepted Answer

Guillaume
Guillaume on 10 Mar 2020
Edited: Guillaume on 10 Mar 2020
If I understood correctly:
pixels_dayofyear = load('all_doy.mat');
pixels_temperature = load('all_variable.mat');
assert(isequal(size(pixels_dayofyear), size(pixels_temperature)), 'Size of matrices doesn''t match!');
%sort each pixel (row, column) temperature (pages) according to the corresponding day of year.
%temperature for invalid days of year (-1) are all replaced by 0 at the end of the page.
pixels_dayofyear(pixels_dayofyear == -1) = NaN; %replace by a value that is sorted last. Could be Inf instead of NaN
[~, order] = sort(pixels_dayofyear, 3); %sort the day of year of each pixel. Get the new order
[rows, cols, ~] = ndgrid(1:size(pixels_dayofyear, 1), 1:size(pixels_dayofyear, 2), 1:size(pixels_dayofyear, 3)); %create row and column indices replicated along the correct dimensions for sub2ind
newpixels_temperature = pixels_temperature(sub2ind(size(pixels_temperature), rows, cols, order)); %reorder temperature according to the sorting of the days
newpixels_temperature(isnan(pixels_dayofyear)) = 0; %replace invalid values by 0

  26 Comments

Guillaume
Guillaume on 12 Mar 2020
"is there a possibility to store both values intead of assuming a mean" No, a matrix element can only have one value. What would be the point anyway? Note that the above doesn't average (but my accumarray solution posted previously can), you just get the last of the two.
"A 10 day moving average, after which a linear interpolation is enough" Wouldn't you do the interpolation beforehand? Otherwise, you're averaging with a lot of zeros.
Either way, I'm not sure how you're going to perform the interpolation efficiently. With much smaller datasets, you'd use a scatteredInterpolant (which would also avoid having to construct that initial matrix full of zeros), but this requires 3 matrices the same size as your final matrix to specify the query points.
However, since you said you're not going to use neighbouring points for calculation the way to get eleminate the memory issue is to divide you satellite image into smaller areas. Split each image into 10 along the rows and columns and you need 100 times less memory.
nlm
nlm on 13 Mar 2020
"No, a matrix element can only have one value. What would be the point anyway? Note that the above doesn't average (but my accumarray solution posted previously can), you just get the last of the two." While interpolating the data, it is required to have all the data even if it is same DOY, instead of mean.
"Wouldn't you do the interpolation beforehand? Otherwise, you're averaging with a lot of zeros". The zero's are turned into NaN, while interpolating.
"However, since you said you're not going to use neighbouring points for calculation the way to get eleminate the memory issue is to divide you satellite image into smaller areas. Split each image into 10 along the rows and columns and you need 100 times less memory." It is global data, has 18000 rows, dividing it into 10 rows, makes it hard to keep count of data blocks. I read 1000*36000 block from each hdf file per year (23 DOYs) and saved it as .mat file and then for 10 years, thus 10 mat files. Do you suggest I save .mat files differently?
Guillaume
Guillaume on 13 Mar 2020
"While interpolating the data, it is required to have all the data even if it is same DOY, instead of mean." I'm not saying it doesn't exist, but I'm not aware of an interpolation method that can cope with two different values at the same point.
"dividing it into 10 rows, makes it hard to keep count of data blocks" I don't see why it would be hard. It's certainly an easy way to get rid of memory issue. At the same time, it also makes it possible to parallelise the processing (but beware of memory usage!) since the processing of the blocks is completely independent of each other. On a supercomputer, I assume that would be a better use of the resources. So, yes, Instead of splitting by year, I'd split by satellite coordinates and process blocks of size MxNx(10*23) -> MxNx366 where M and N are such that the data fits comfortably in memory.
Addendum: Assuming than M and N are reasonable so that the below fits in memory, then this is how I'd implement the transformation from the MxNx(10*23) to MxNx366 matrix. This directly interpolates the MxNx(10*23) data without ever creating the matrix of 0s:
%inputs:
%pixels_dayofyear, a M x N x day matrix indicating the day of measurement of the respective location in pixels_temperature
%pixels_temperature, a M x N x day matrix indicating the temperature on the respective day in pixels_dayofyear
[M, N, days] = size(pixels_dayofyear);
[inrows, incols, ~] = ndgrid(1:M, 1:N, 1:days); %create matrices of locations for each temperature
isvalid = pixels_dayofyear > -1
temperature_interpolant = scatteredinterpolant(M(isvalid), N(isvalid), pixels_dayofyear(isvalid), pixels_temperature(isvalid)); %create interpolant using locations and days as coordinates
%the line below creates 3 double matrices of size M x N x 366. They will use a lot of memory for large M and N!
[qrows, qcols, qday] = ndgrid(1:M, 1:N, 1:366); %matrices of all query locations and days
interpolated_temperature = reshape(temperature_interpolant(qrows, qcols, qday), M, N, 366); %interpolate at all query points and reshape into 3D matrix
Beware that qrows, qcols and qday will require M*N*366*8 bytes of memory. ScatteredInterpolant only accepts doubles so there's no way to save memory there by using a smaller type. However, these 3 arrays can be used unchanged for all the blocks (assuming they're all the same size).

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!