speed up for loop

Question

Mina Mino on 7 Apr 2023

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/1943344-speed-up-for-loop

Commented: 埃博拉酱 on 9 Apr 2023

Hi everone. I have a problem with my code which was written in the following form. it take a long time and it is too time consuming.

In this code, I need to read 700 .hgt files with the size of 3600*3600 and make distincts 3*3 matrix from each of them.

I woulde be thankful if anyone could help me. Thanks in advance for your attenuation and help.

clc
clear all
close all
format long g
fid=fopen('list2.txt','r');
if fid==-1
    disp('there is an error')
else
end
S = textscan(fid,'%s');
fclose(fid);
i=1;
fnames= S{i,i};
tic
NA=[];
for fiile=1:700
    varargout = readhgt(fnames{fiile});
    LAT=varargout.lat;LAT(end)=[];
    LON=varargout.lon;LON(end)=[];
    Z=varargout.z;Z(:,end)=[];
    Z(end,:)=[];
    %        Z_vec=reshape(Z,[12967201,1]);
    ROUGH=[];
    for row=1:3:3597
        for col=1:3:3597
            A=[Z(row,col) Z(row,col+1) Z(row,col+2);Z(row+1,col) Z(row+1,col+1) Z(row+1,col+2);Z(row+2,col) Z(row+2,col+1) Z(row+2,col+2)];
            Dif=(double(A)-mean(mean(A))).^2;
            rough=sqrt(mean(mean(Dif)));
            ROUGH=[ROUGH;mean([LAT(row) LAT(row+1) LAT(row+2)]) mean([LON(col) LON(col+1) LON(col+2)]) rough];
        end 
    end 
    NA=[NA;ROUGH];
end 
toc

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

actually the following part takes the most of time. in this part I am exploiting dependent 3*3 matrixs from a file with 3600*3600 elements.

for row=1:3:3597
    for col=1:3:3597
        A=[Z(row,col) Z(row,col+1) Z(row,col+2);Z(row+1,col) Z(row+1,col+1) Z(row+1,col+2);Z(row+2,col) Z(row+2,col+1) Z(row+2,col+2)];
        Dif=(double(A)-mean(mean(A))).^2;
        rough=sqrt(mean(mean(Dif)));
        ROUGH=[ROUGH;mean([LAT(row) LAT(row+1) LAT(row+2)]) mean([LON(col) LON(col+1) LON(col+2)]) rough];
    end 
end 

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

埃博拉酱 on 7 Apr 2023

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/1943344-speed-up-for-loop#answer_1211609

Edited: 埃博拉酱 on 7 Apr 2023

Open in MATLAB Online

When it comes to file I/O operations, parfor is usually not very helpful because the disk is always single-threaded. In contrast, eliminating for loops with vectorized methods can effectively improve performance.

fid=fopen('list2.txt');
if fid==-1
	error('there is an error');
end
S = textscan(fid,'%s');
fclose(fid);
fnames= S{1};
tic
NA=cell(700,1);
for file=1:700
	varargout = readhgt(fnames{file});
	Z=varargout.z(1:3600,1:3600);
	Z=std(reshape(Z',3,size(Z,1)/3,3,size(Z,2)/3),1,[1,3]);
	[LAT,LON]=meshgrid(mean(reshape(varargout.lat(1:3600),3,[]),1),mean(reshape(varargout.lon(1:3600),3,[]),1));
	NA{file}=[LAT(:),LON(:),Z(:)];
end
NA=vertcat(NA{:});
toc

Above I assume your files all contain 3601×3601 Z data. You can do more corrections and optimizations with your real data.

6 Comments
Show 4 older commentsHide 4 older comments

Mina Mino on 7 Apr 2023

Edited: Walter Roberson on 8 Apr 2023

Open in MATLAB Online

thank for your nice help. it really work for my code. I have the same problem with the following for. it is too time consuming. could you please help me with this as well?thank in advance for your consideration.

%%
% begining the code%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
FT=[];step=18/110;
parfor  d=1:12*10^(6) 
    isamap_36=find(DOY==day_of_year(d));
    isamap_9=find(DOY_R==day_of_year(d));
    
    Lon_d=Lon(isamap_36);
    Lat_d=Lat(isamap_36);
    VWC_SMAP_d=VWC(isamap_36);
    roughness_d=roughness(isamap_36);
    TM_d=TM(isamap_36);
    SM_d=SM_36(isamap_36);
    
    LAT_R_d=LAT_R(isamap_9);
    LON_R_d= LON_R(isamap_9);
    VWC_R_9_d=VWC_R(isamap_9);  
    SM_R_9_d=SM_R(isamap_9);
    ROUGH_R_9_d=ROUGH_R(isamap_9);
    
    if (~isempty(isamap_36) && ~isempty(isamap_9))
        
        lon_smap_9_index= find(LON_R_d <=lon_cy(d)+step & LON_R_d>=lon_cy(d)-step);
        lat_smap_9_index= find(LAT_R_d <=lat_cy(d)+step & LAT_R_d>=lat_cy(d)-step);
        intsec_smap_9=intersect(lon_smap_9_index,lat_smap_9_index); 
        VWC_SMAP_g_9=VWC_R_9_d(intsec_smap_9);roughness_g_9=ROUGH_R_9_d(intsec_smap_9);SM_g_9=SM_R_9_d(intsec_smap_9);
        
        lon_smap_36_index= find(Lat_d <=lat_cy (d)+step & Lat_d>=lat_cy(d)-step);
        lat_smap_36_index= find(Lon_d <=lon_cy(d)+step & Lon_d>=lon_cy(d)-step);
        intsec_smap_36=intersect(lon_smap_36_index,lat_smap_36_index); 
        
        VWC_SMAP_g_36=VWC_SMAP_d(intsec_smap_36);roughness_g_36=roughness_d(intsec_smap_36);TM_d_36=TM_d(intsec_smap_36) ;SM_g_36=SM_d(intsec_smap_36);
        
        point_1=[lat_cy(d) lon_cy(d)];
        ptCloud_1=[latitu longi];
        idx_land  = knnsearch(ptCloud_1,point_1);
        
        ptCloud_2=[clay(:,1) clay(:,2)];
        idx_clay  = knnsearch(ptCloud_2,point_1);
        
        
        if (~isempty(intsec_smap_36)&& ~isempty(intsec_smap_9)) 
            
            FT=[day_of_year(d) lon_cy(d) lat_cy(d) inc_cy(d) ref_cy(d) mean(VWC_SMAP_g_9) mean(roughness_g_9) mean(SM_g_9) mean(VWC_SMAP_g_36) mean(roughness_g_36) mean(TM_d_36) mean(SM_g_36) clay(idx_clay,3) land_cover(idx_land)];
            S=FT;
            parsave(sprintf('output%d.mat', d),S );
        end
    end
end

Walter Roberson on 8 Apr 2023

When it comes to file I/O operations, parfor is usually not very helpful because the disk is always single-threaded

Disk controllers can have multiple DMA lanes. The operating system might be single threaded in disk I/O at some level, but it can launch a DMA and then return -- some multiple DMA can be happening simultaneously. The controllers have room for several commands and buffers, and for spinning drives will actively re-order commands to improve performance. Meanwhile, each controller can be handling multiple drives, each of which can support queuing of multiple commands. Each drive can be DMA'ing with its respective controller. The connection to the individual drives does not necessarily support multiple DMA lanes.

So one connection at a time between a drive and a controller, but the controller is dealing with multiple drives simultaneously, and is quite possibly handling multiple communications simultaneously -- DMA'ing into multiple buffers getting ready to flip them to the O/S for example.

Does having multiple drives per controller channel help? Maybe. In a RAID kind of scenario, especially if you have mirrored drives then multiple drives on the channel can lead to opportunities to dispatch an I/O operation to whichever drive will be ready for it first. But on one channel, you might not be able to have multiple drives transferring data in the same direction.

This leads to the general principle that it is typically safe and productive to have one drive per controller channel, and multiple controllers, that the infrastructure can be doing simultaneous I/O in such circumstances.

For any particular drive, the general principle is that having 2 to 3 requests queued at the same time is often optimal, allowing the controller to reorder the requests according to what is spinning by.

On high performance systems, it might be practical to have one I/O stream per drive head .

Beyond that.. for any one file, unless you go to a lot of trouble, you tend to get conflict trying to do multiple I/O with the same file, as the same head might be needed (because it is on the same track). But different files tend to be in different tracks.

... which is to say that having a modest number of different files per drive being accessed by parallel workers can improve performance, and as you spread the load out over different drives you can definitely improve performance. The single threading is not as bad of a restrictions as you might think.

(Single threading can be necessary in order to provide "atomic" operations, such as the fact that operating systems must guarantee that file renames must at all user-accessible times either give the old name or the new name, that there is no user-accessible case in which both names or neither name is accessible.)

埃博拉酱 on 9 Apr 2023

@Walter Roberson For multiple files uniformly distributed across multiple drives, parallel I/O can theoretically improve performance, but this may require elaborated load balancing algorithms for these drives, which is very tricky. Judging from the code shown by OP, this is probably far beyond his programming ability.

Sign in to comment.

Answer 2

chicken vector on 7 Apr 2023

0
Link

Direct link to this answer

https://se.mathworks.com/matlabcentral/answers/1943344-speed-up-for-loop#answer_1211529

Open in MATLAB Online

Try parallel computing:

% Get CPU's numebr of cores:
[~, numberOfCores] = evalc('feature(''numcores'')');
% Start parallel pool:
parpool(numberOfCores)
% Parallelise <for> loop:
parfor file = 1 : 700
    
   % dostuff 
end

Just be carefull and read the documentation about how to save restults from parfor iteratively, as you need to pre-allocate the output variable and you will have some other constraints.

I suggest you to read the documentation and see some examples.

3 Comments
Show 1 older commentHide 1 older comment

chicken vector on 7 Apr 2023

Edited: chicken vector on 7 Apr 2023

Open in MATLAB Online

The first advice that I can give you is, as Matlab suggests you, initialise your variable ROUGH before the for loop.

This is particularly important when dealing with big dataset because at each loop iteration the size of ROUGH is changed and the RAM of your computer needs to re-allocate the data inside it to have the bits representing ROUGH close to each other.

Now imagine how time expensive this could be when you do it (3597/3)^2 times, when you could save this time by "reserving" your RAM spots before the loop by substituting:

ROUGH = [];

With:

ROUGH = zeros(3597, 3);

The second advice is to use vectorisation as much as possible because it is much faster than using for loops.

Vectorisation can be seen as a sort of parallel computation rather than in sequence.

Consider as an example that I want to obtain the element-wise difference between two matrices. The following 3 methods obtain the same result, but performances are quite different:

N = 500;
x = rand(N);
y = rand(N);
% For loop without initial allocation:
tic
z = [];
for i = 1 : N
    for j = 1 : N
        z(i,j) = x(i,j) - y(i,j);
    end
end
T1 = toc
% For loop with initial allocation:
tic
z = zeros(N);
for i = 1 : N
    for j = 1 : N
        z(i,j) = x(i,j) - y(i,j);
    end
end
T2 = toc
% Vectorisation:
tic
z = x - y;
T3 = toc

On my computer I obtain:

T1 = 0.0147
T2 = 0.0064
T3 = 0.0005

To exploit vectorisation in your code, try to have a look at reshape.

First thing that comes to my mind is to re-organise your Z, LAT, LON variables into matrices of size [3, (3597^2)/3] so that you can already vectorise along one dimension.

Nontheless this might be a bit complex so I suggest you to begin with the initialisation of ROUGH.

In my code saves half the computational time with a 500x500 matrix which only has 2 % of the elements you have in a 3597x3597 matrix, since it scales with the square of the dimension.

TLDR:

Try this:

ROUGH = zeros(3597,3);
select = (0:2);
current_rough_row = 0;
for row = 1 : 3 : N
    for col = 1 : 3 : N
        current_rough_row = current_rough_row + 1;
        A = Z(row + select, col + select);
        Dif = (A - mean(A,'all')).^2;
        rough = sqrt(mean(Dif,'all'));
        ROUGH(current_rough_row,:) = [mean(LAT(row + select)) mean(LON(col + select)) rough];
    end
end

And do the same with NA:

NA = zeros(3597*700, 3); % Instead of: NA = [];
NA[700*(fiile-1)+1 : 700*fiile] = ROUGH; % Instead of NA = [NA;ROUGH];

Mina Mino on 7 Apr 2023

thanks for you response and help.

Sign in to comment.

speed up for loop

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

6 Comments
Show 4 older commentsHide 4 older comments

More Answers (1)

3 Comments
Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

speed up for loop

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

6 Comments Show 4 older commentsHide 4 older comments

More Answers (1)

3 Comments Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

6 Comments
Show 4 older commentsHide 4 older comments

3 Comments
Show 1 older commentHide 1 older comment