How to calculate trace(A' * B) using SVD vectors of A and B?

Hi all,
Imagine I have 2 large matrices which have more rows than columns, I'd like to calculate trace(A' * B) for N times. I have 2 options: 1. calculate trace(A' * B) directly; 2. only calculate vector product of the diagonal, then sum it. I test with the following minimum example, it turns out the 2nd option is faster:
clear; clc;
nd = 100000;
nt = 500;
A = rand(nd, nt);
B = rand(nd, nt);
N = 100;
%%no SVD.
% option 1.
tic
for i = 1:N
abTr = trace(A' * B);
end
toc
% option 2.
tic
abSum = 0;
for i = 1:N
for j = 1:nt
abSum = abSum + A(:, j)' * B(:, j);
end
end
toc
Elapsed time is 58.320345 seconds.
Elapsed time is 13.558039 seconds.
Now I need to apply truncated-SVD to A and B to optimise storage. the following code is applied to leave only 10 vectors
% apply svd
[ua, sa, va] = svd(A, 'econ');
[ub, sb, vb] = svd(B, 'econ');
% group in cells
nRem = 10;
ucell = {ua va ub vb};
scell = {sa sb};
% truncate
ucell = cellfun(@(v) v(:, 1:nRem), ucell, 'un', 0);
scell = cellfun(@(v) v(1:nRem, 1:nRem), scell, 'un', 0);
My question is: is it possible to calculate trace (it's an approximation due to truncation) using option 2 with these SVD vectors in ucell and singular values in scell? I can do it with option 1 using
trace((vb' * va * sa' * (ua' * ub) * sb)
But I'd like to be faster by only considering diagonal elements.
Many thanks!

 Accepted Answer

I am not aware of any identities that can express trace(A'*B) in terms of SVDs of A and B, and I dont think one exists, but there is an identity that does not require explicit calculation of the A'*B matrix:
trace(A'*B) = sum(A(:).*B(:))
For example,
nd = 100000;
nt = 500;
A = rand(nd, nt);
B = rand(nd, nt);
N = 100;
tic
tr1=trace(A'*B);
t1=toc;
%Elapsed time is 0.857053 seconds.
tic
t2=sum(A(:).*B(:));
toc
%Elapsed time is 0.210807 seconds.
As you can see, you obtain the result about 4 times faster when using the above identity. This identity is especially useful when nt is 'large', because the machine may not have insufficient memory to hold the nt-by-nt matrix A'*B.

10 Comments

Hi Anton,
Thanks for your reply. There is an expression of expressing trace(A' * B) with SVD of A and B. Say [ua, sa, va] and [ub, sb, vb] are SVD outputs of A and B, respectively. Then
trace(A' * B) = trace((vb' * va * sa' * (ua' * ub) * sb)
Please see my comment to Jan for the reason why SVD is needed in my case. I need it for storage optimisation.
That's not a very useful identity. It is just SVDs of A' and B being multiplied.
This is true, but I need SVD vectors such that I do not need to store full matrices. As a result, the trace will need to be computed from the SVD vectors, which is why I asked the question.
Ok, lets say you performed SVD on nd-by-nt matrices A and B to get A=Ua*Da*Va' and B=Ub*Db*Vb', and then retained k and m dominant modes for A and B, respectively. Lets call those approximations:
A_k=Ua_k*Da_k*Va_k';
B_m=Ub_m*Da_m*Vb_m';
In terms of SVDs, the product A_k'*B_m is
A_k'*B_m = (Ua_k*Da_k*Va_k')'*(Ub_m*Da_m*Vb_m') =
= Va_k*(Da_k*Ua_k'*Ub_m*Da_m)*Vb_m' =
= Va_k*C*Vb_m'
The matrix C=Da_k*Ua_k'*Ub_m*Da_m is a k-by-m matrix, while Va_k and Vb_m are nt-by-k and nt-by-m matrices. Because nt is large, we want to avoid explicit composition of Va_k*C*Vb_m' when evaluating tr(Va_k*C*Vb_m'). This is where the identity I mentioned above comes in.
Define intermediate variable
VaC=Va_k*C; % nt-by-m matrix
Evaluate tr(VaC*Vb_m') = tr(A_k'*B_m) as:
VaC(:)'*Vb_m(:)
Thanks to Jan for pointing out that a(:)'*b(:) is faster than sum(a(:).*b(:))
Hi Anton,
Thanks for the reply. I did a little test with you solution, it turns out the trivial identity trace((vb' * va * sa' * (ua' * ub) * sb)) is the fastest.
clear; clc;
nd = 100000;
nt = 100; % repeat the test m times.
m = 100;
A = rand(nd, nt);
%%SVD.
[ua, sa, va] = svd(A, 0);
[ub, sb, vb] = svd(B, 0);
n = 10;
uan = ua(:, 1:n);
ubn = ub(:, 1:n);
san = sa(1:n, 1:n);
sbn = sb(1:n, 1:n);
van = va(:, 1:n);
vbn = vb(:, 1:n);
% option 1.
tic
for it = 1:m
trsvd = trace((vbn' * van * san' * (uan' * ubn) * sbn));
end
toc
% option 2.
tic
for it = 1:m
absvd1 = trace((uan * san * van')' * (ubn * sbn * vbn'));
end
toc
% option 3
tic
for it = 1:m
absvd2 = trace(van * (san * uan' * ubn * sbn) * vbn');
end
toc
% option 4.
tic
for it = 1:m
c = san * uan' * ubn * sbn;
vc = van * c;
absvd3 = vc(:)' * vbn(:);
end
toc
Elapsed time is 0.532262 seconds.
Elapsed time is 28.128461 seconds.
Elapsed time is 1.198035 seconds.
Elapsed time is 0.771343 seconds.
trsvd is what I'm using now, absvd1 is simply reproducing trace(A' * B), absvd2 is putting the intermediate variables in brackets; absvd3 is what you said, define intermediate variable separately then calculate. Surely they give the same answer, but it seems that trsvd is the fastest. I checked the profiler, this line took 0.69s (for running 100 times in total):
c = san * uan' * ubn * sbn;
while this line took 0.29s (for running 100 times in total):
trsvd = trace((vbn' * van * san' * (uan' * ubn) * sbn));
What do you think?
So the answer to my question is no, I cannot calculate the diagonal elements of A' * B using SVD vectors of A and B?
If SVDs of A and B can recover the entire (A'*B) matrix, then they can also be used to recover the elements on the diagonal of (A'*B). In the end, do you need diag(A'*B) or trace(A'*B)?
trace(A' * B) = sum(diag(A' * B)), so actually diagonal elements are needed rather than all elements right?
My understanding is: in order to calculate diagonal elements of (A' * B), only paired column vectors of A and B are needed, i.e. if A = [ai]_i = 1^n, B = [bi]_i = 1^n, then only ai' * bi needs to be calculated.
However, this calculation of diagonal elements only cannot be achieved once A and B are decomposed by SVD, because recovering A' * B by SVD vectors recovers a full matrix which contains part of the entire diagonal line, rather than one or a few elements on the diagonal line.
I thought the problem was resolved because you found an efficient way to evaluate tr(A'*B) using SVDs of A and B.
Nevertheless, of course it is possible to recover ONLY the diagonal elements of A'*B from the SVDs of A and B while staying within memory constraints of the machine. However, I am not sure that computing these diagonal elements first and then summing them would help you improve evaluation of tr(A'*B).

Sign in to comment.

More Answers (1)

Jan
Jan on 26 Jun 2018
Edited: Jan on 26 Jun 2018
I assume this is faster:
nd = 100000;
nt = 500;
A = rand(nd, nt);
B = rand(nd, nt);
tic;
abTr = trace(A' * B);
toc % Elapsed time is 0.497233 seconds.
tic;
abSum = 0;
for j = 1:nt
abSum = abSum + A(:, j)' * B(:, j);
end
toc; % Elapsed time is 0.141059 seconds.
tic
abTr2 = sum(A(:).*B(:));
toc % Elapsed time is 0.182408 seconds.
tic;
abTr3 = A(:).' * B(:);
toc % Elapsed time is 0.054548 seconds.
tic;
[ua, sa, va] = svd(A, 'econ');
[ub, sb, vb] = svd(B, 'econ');
toc % Elapsed time is 10.629806 seconds.
% And this does not even include the further computations
% relative error:
abs(abTr - abTr3) / abTr
The reshaping by (:) and the transposition creates at least shared data copies, such that no additional memory is used. Maybe the JIT acceleration can call the dot product of the BLAS library directly, where the transposition can be provided as a flag.
Of course for a more accurate measurement some loops would be smarter, or better use timeit. But a factor of 3 ( abTr3 against abSum) is significant enough.

5 Comments

Hi Jan,
Thanks for your reply.
The SVD is applied because I will need to store nt of such full matrices in my code, i.e. nt of nd*nt matrices. If not applying SVD, result would be too large for memory. For example, if storing 500 100000*500 full matrices, total memory usage is 1e11 bytes = 93 Gb. However if applying SVD and store 10 vectors each, total memory usage is 2.02e9 bytes = 1.8816 Gb. Do you have better ideas?
@Xiaohan Du: I don't get it. What are your inputs and what do you want to calculate?
In my code there is a function outputs nt matrices, each size is nd-by-nt. I need to calculate trace between each 2 of them. Say nt = 3, then the function gives matrices A, B, C, then I need trace(A'*A), trace(A'*B), trace(A'*C), trace(B'*B), trace(B'*C), trace(C'*C). I agree with you, calculating trace(A'*B) is slower than A(:)' * B(:), but due to the storage reason I explained above, I cannot store nt such matrices in my computer due to the large size, which is why I have to apply SVD and use SVD vectors to calculate trace.
Hi Jan,
I don't understand why
A(:).' * B(:)
is faster than
for j = 1:nt
abSum = abSum + A(:, j)' * B(:, j);
end
theoretically, the second operation only has nt operations, each is a vector product between the ith vectors of A and B, while the first has to calculate the product between every 2 vectors in A and B, so shouldn't the first cost more than the second?
A(:) creates a shared data copy: Only the header of the variable is changed, but the data are not copied. The same happens for the transposition. But Matlab's JIT acceleration might be so smart and avoid even the shared data copies by calling the underlying BLAS function DDOT directly. This function is implemented with multi-threading - you can see this in the task-manager, where all cores are loaded during the processing.
So A(:).' * B(:) starts as many threads as cores are available and no memory has to be copied around in the slow RAM.
abSum = abSum + A(:, j)' * B(:, j) calls DDOT also and starts multiple threads. The starting of a thread is very costly and here we need 4 starts for each column. I'm not sure if Matlab's JIT is smart enough to avoid the creation of the temporary vectors A(:,j) and B(:,j) in each iteration. But if this is performed explicitly, two variables have to be created and the elements of the columns are copied. Calling the library function DDOT costs some time for the overhead also.
Summary:
  • Case 1: DDOT is called one time and 4 threads are started.
  • Case 2: For nt=500, 1000 columns are copied, DDOT is called 500 times, 2000 threads are started (on a 4 core CPU).
So for me the speedup is expected.
the first has to calculate the product between every 2 vectors in A and B
This might be a misunderstanding: A(:) and B(:) are two vectors only, not "every two vectors".

Sign in to comment.

Categories

Asked:

on 26 Jun 2018

Edited:

Jan
on 29 Jun 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!