Without resorting to FEX/MEX, can the overhead of repeated calls to sparse() be avoided with operations on I, J, V directly?
Show older comments
I have a time-marching problem bottlenecked by matrix multiplication
, where the sparse matrix
is the sum of many time-independent sparse matrices with time-dependent scalar coefficients:
, where the sparse matrix 
I'm aware of FEX solutions that would enable a 3D
to be defined once prior to time-marching, though I would prefer to rule out native MATLAB first. My current thinking is to supplement the I, J, V vectors with an index vector K, and do one of the following:
to be defined once prior to time-marching, though I would prefer to rule out native MATLAB first. My current thinking is to supplement the I, J, V vectors with an index vector K, and do one of the following:- Generate a cell array of sparse matrices along index K, with
multiplied with bsxfun, similar to this. (Probably the slowest option) - Sparse matrix construction with B=sparse(I, J, V.*beta(K), N, N)*X
- Avoid sparse matrix entirely with B=accumarray(I, V.*beta(K).*X(J), N, N)
I haven't tested option 1, but option 3 with accumarray runs about 50x slower than option 2 (with or without issparse option). This makes sense, as the multiplication with X in accumarray is done before accumulation, hence on a much larger array.
However, option 2 still appears to be wasting performance. For example, sparse(I, J, V, N, N)*X runs about 50x slower than basic A*X. My initial interpretation of this is that much of the time in sparse is spent on unavoidable summation of common terms in I,J,V. I cant profile sparse internally, but tested this by accumulating V prior to sparse(I, J, V, N, N)*X, which comes out to ~7x slower than A*X.
My conclusion is that there is still some significant & (potentially) avoidable overhead from calling sparse many times unrelated to summation of duplicate indices, and the larger piece of summation that is unavoidable due to the summation of time-dependent
terms co-located in
.
To boil this down, I have two questions I cannot answer:
- Considering the pattern of summation of duplicate indices is always the same (static I/J), can the summation overhead in sparse be improved?
- What is the source of the remaining overhead in sparse and can it be avoided?
20 Comments
Joel Lynch
on 6 Aug 2022
Bruno Luong
on 6 Aug 2022
To get a feeling, can you provide
- typical size of A0
- typical density of Ak
- typical density of A0
- typical length of k (number of matrices)
- typical length of t
Joel Lynch
on 6 Aug 2022
Edited: Joel Lynch
on 6 Aug 2022
Joel Lynch
on 6 Aug 2022
Edited: Joel Lynch
on 6 Aug 2022
Bruno Luong
on 6 Aug 2022
That looks promising to me. I stop look for an alternative solution, unless you find the new method still too slow.
Bruno Luong
on 6 Aug 2022
BTW does X change with time? If not you can precompute X(J) ready for accumarray.
You can even precompute
P(l,k) := M(z,k).*X(J)
then
B = accumarray(I, P*beta,N, N)
Joel Lynch
on 6 Aug 2022
Bruno Luong
on 6 Aug 2022
Edited: Bruno Luong
on 6 Aug 2022
The question can be summarized as: Is
B=accumarray(I, V.*beta(K).*X(J), N, N);
faster than
S = sparse(I, J, V, [N N]) * X;
? It's not clear to me.
Joel Lynch
on 6 Aug 2022
Edited: Joel Lynch
on 6 Aug 2022
James Tursa
on 7 Aug 2022
Coming to this party late. Is the goal to generate sparse matrix
on the fly as
changes over time? You mention that the
are static. Does that mean that the sparsity pattern of
is also static? Or could the result of the sum possibly change the sparsity pattern of
over time? E.g., such as
being 0 at some point in time, etc.
Joel Lynch
on 7 Aug 2022
Edited: Joel Lynch
on 7 Aug 2022
James Tursa
on 7 Aug 2022
Edited: James Tursa
on 7 Aug 2022
Is x a vector? Is x full or sparse? Is x static or does it change? Do you want B to be full or sparse?
Joel Lynch
on 7 Aug 2022
Bruno Luong
on 7 Aug 2022
One could build the internal storage of A0 with I, J, and dummy V, then only need to fill with V.*beta(K) every iteration, but that requires MEX, and it seems you don't want. There is no such low-level manipulation in pure MATLAB.
Bruno Luong
on 7 Aug 2022
Forget my last comment about MEX, it's not that so simple as I though.
James Tursa
on 7 Aug 2022
Edited: James Tursa
on 7 Aug 2022
Maybe I am missing something, but isn't the desired calculation simply
? Why isn't this just a series of
matrix*vector multiples summed into a vector result? This would seem to be a fairly straightforward mex routine to me, especially if x is full. No need to physically calculate
at all. All of the
matrices could even be stacked horizontally into one 2D matrix for this. The only caveat would be if x could possibly contain inf or NaN (checking for that condition would slow things down). Or could work with all of the
in I,J,V vector storage format as well. No real need for explicit sparse storage format.
Joel Lynch
on 7 Aug 2022
James Tursa
on 7 Aug 2022
Edited: James Tursa
on 7 Aug 2022
This is the math as I understand it, although this is not how I would physically implement it. Were I to write code for this, e.g. I would not physically replicate the
into one long vector and multiply all the elements by β as I show, but do the equivalent math virtually and only do the necessary non-zero multiplies. If speed is the issue and I knew there were no inf or NaN present, could skip those checks.
Abig = [
] % concatenate the
horizontally into NxNM matrix (static)
x = Nx1 column vector (time dependent)
beta = 1xM row vector (time dependent)
Xbig = x .* beta % a NxM matrix with each column
using virtual array expansion
B = Abig * Xbig(:) % the result of one big matrix*vector multiply
The above is just the notional math. The actual coding would combine the last two steps virtually, and could even be multi-threaded. The coding would also depend on how the
are actually stored (as separate matrices, as one big matrix, as sparse, or as I,J,V triplets).
Does this match the calculations you want, or am I missing something?
Joel Lynch
on 7 Aug 2022
Edited: Joel Lynch
on 7 Aug 2022
Bruno Luong
on 7 Aug 2022
@James Tursa yes you get it right. However
- In the range of density/size and N_k (m in your notation) your method is still 50% slower than the accumarray solution we come up with,
- The runtime scale with N_k and is not bounded by the assumption of limited "bandwidth" (density of A0) stated by @Joel Lynch. It is probably an important requirement for his application.
Accepted Answer
More Answers (0)
Categories
Find more on Matrices and Arrays in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


