Why vectorized calculations are faster than for loops?
Show older comments
Why it's faster in Matlab? Is it because better memory treating, or paralleling? If this is only due parallel computation, on single-core laptop it will be now difference between? Thanks<
Answers (2)
At first: There is no evidence that vectorized code is faster in general.
If a build-in function can be applied to a complete array, a vectorization is much faster than a loop appraoch. When large temporary arrays are required, the benefits of the vectorization can be dominated by the expensive allocation of the memory, when it does not match into the processor cache.
A secondray effect of vectorizing is that the code looks more clear, at least as a rule of thumb. A trivial example:
% Loops:
A = rand(10);
B = rand(10);
C = zeros(size(A));
for i2 = 1:size(A, 2)
for i1 = 1:size(A, 1) % Columns in the inner loop
C(i1, i2) = A(i1, i2) + B(i1, i2);
end
end
% Vectorized:
C = A + B;
The 2nd method is faster concerning the runtime, but also for the programming and debug time. There is almost no chance to create a bug and it will be very easy to understand the code, when the program needs changes in the future.
3 Comments
Image Analyst
on 27 Oct 2014
The comment should say "Rows are in the inner loop", not columns.
Interestingly, with R2014b I'm not seeing much difference, at least with this example, when the rows or columns are in the inner loop. Maybe the arrays are too small. But since MATLAB stores in column major order, it should be faster to iterate over the rows first (inner loop).
% Loops:
A = rand(1500);
B = rand(1500);
C = zeros(size(A));
tic
for col = 1:size(A, 2) % Columns are in the outer loop
for row = 1:size(A, 1) % Rows in the inner loop
C(row, col) = A(row, col) + B(row, col);
end
end
toc
tic
for row = 1:size(A, 1) % Rows in the outer loop
for col = 1:size(A, 2) % Columns are in the inner loop
C(row, col) = A(row, col) + B(row, col);
end
end
toc
tic
% Vectorized:
C = A + B;
toc
Times:
Elapsed time is 1.713167 seconds.
Elapsed time is 1.711998 seconds.
Elapsed time is 0.004025 seconds.
Keldon Alleyne
on 1 Oct 2018
Multiplying matrices in loops is O(N^3), while the fastest algorithms using other methods are O(N^2.3) - O(N^2.8), which can easily explain the differences in performance.
On my laptop with Matlab 2018b I get:
Elapsed time is 0.112362 seconds.
Elapsed time is 0.214544 seconds.
Elapsed time is 0.007935 seconds.
Vibhav
on 24 Apr 2025
Moved: Walter Roberson
on 24 Apr 2025
0 votes
Looks like your question about why vectorizing is fundamentally faster (in most cases) was never answered.
The answer is complicated and I won't provide all the details, but in addition to optimized memory access and multithreading, vectorized code uses Single Instruction Multiple Data (SIMD) instructions which are specific CPU instructions that perform the same computation on multiple data, thus speeding up the computational throughput of the same CPU / core. You can look up "SIMD" for more information on how this is done, but the TL;DR is that vectorized MATLAB core gets "Just-In-Time" (JIT) compiled to BLAS and LAPACK subroutines that exploit SIMD instructions.
Vectorization also exploits multithreaded routines on machines that support it. It is possible to have "multithreaded" code on a single core CPU, but that is not true parallelism as only one instruction can execute at a time, but from a user's standpoint it appears to be concurrent.
Categories
Find more on Startup and Shutdown in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!