AUC between different size curves

3 views (last 30 days)
Suzuki
Suzuki on 7 Sep 2021
Edited: Suzuki on 11 Sep 2021
How can I calculate the area between 2 curves of unequal data size, as in the photo?

Accepted Answer

Julius Muschaweck
Julius Muschaweck on 7 Sep 2021
I would use interp1 to interpolate both functions so they have the same high resolution x support. You can easily replace the 0.01 step by a 1e-6 step size in the code below.
Look for interp1 options, to control the interpolation algorithm (you might prefer 'linear' to avoid overshoots), and to control extrapolation (in your probability distribution case, you might want to set the value for extrapolation to 1 if one of your distributions has shorter x range).
Then use trapz to compute the signed integral of the difference or unsigned area between the curves. See code.
x1 = 0:0.2:1.4;
y1 = x1.^2;
x2 = 0:0.14:1.4;
y2 = sqrt(x2);
figure();
hold on;
plot(x1,y1,'-x');
plot(x2,y2,'-x');
xq = 0:0.01:1.4;
y1q = interp1(x1,y1,xq);
y2q = interp1(x2,y2,xq);
my_signed_integral = trapz(xq,y2q-y1q)
my_signed_integral = 0.1701
my_unsigned_area = trapz(xq,abs(y2q-y1q))
my_unsigned_area = 0.4631
test_signed = trapz(xq,sqrt(xq) - xq.^2)
test_signed = 0.1894
test_unsigned= trapz(xq,abs(sqrt(xq) - xq.^2))
test_unsigned = 0.4768
  11 Comments
Julius Muschaweck
Julius Muschaweck on 11 Sep 2021
The trouble in your example is that you extrapolate y1i linearly to the right, which makes the extrapolated values grow to almost 4. This is not what you want. You have cumulative distributions, which are inherently zero on the left, all the way to minus infinity, and one to the right, all the way to plus infinity. So you should extrapolate to the right with constant values of 1, and to the left with constant values of zero.
You run into this problem no matter if you use my initial idea or Walter's actually superior suggestion of using union().
Like this:
clear;
load("bins.mat")
load("cdist.mat")
load("Zbins.mat")
load("Zcdist.mat")
%%
% I would try to control the step size in your xq array
dx = 0.01; % some small step size to decrease discretization error
xmin = min([Zbins';bins']);
xmax = max([Zbins';bins']);
nsteps = ceil((xmax-xmin) / dx) + 1;
xq = linspace(xmin, xmax, nsteps);
% but now that we're at it, let's try also what Walter Roberson suggested:
xq = union(Zbins, bins, "sorted");
% xq = linspace(min([Zbins';bins']), max([Zbins';bins']))'; % New %x Vector For Interpolation
% i check max and min in both and get the max and min among the 2 sets, so I thought this covers the
% whole data range
% extrapolation does a bad job for you here. Look at the values in y1i: They go up to almost 3.9 !
%y1i = interp1(Zbins', Zcdist', xq, 'linear','extrap'); % Interpolate To %xqT
%y2i = interp1(bins', cdist', xq, 'linear','extrap'); % Interpolate To %xq%
%my_unsigned_area = trapz(xq,abs(y2i-y1i))
% without 'extrap', interp1 "extrapolates" to NaN (not a number)
y1i = interp1(Zbins', Zcdist', xq, 'linear'); % Interpolate To %xqT
y2i = interp1(bins', cdist', xq, 'linear'); % Interpolate To %xq%
%now set "outliers" on the left to 0, and on the right to 1
% logical indexing is the non-for-loop way to do this:
idx1_left = xq < Zbins(1); % assuming Zbins and bins are strictly ascending
y1i(idx1_left) = 0;
idx1_right = xq > Zbins(end);
y1i(idx1_right) = 1;
idx2_left = xq < bins(1); % assuming Zbins and bins are strictly ascending
y2i(idx2_left) = 0;
idx2_right = xq > bins(end);
y2i(idx2_right) = 1;
% check if we caught all outliers
assert(~any(isnan(y1i)));
assert(~any(isnan(y2i)));
my_unsigned_area = trapz(xq,abs(y2i-y1i))
my_unsigned_area = 17.8458
Suzuki
Suzuki on 11 Sep 2021
Edited: Suzuki on 11 Sep 2021
thank you very much for the kind and detailed explanation.
@Julius Muschaweck I really appreciate your help, I would have not done it without your generous help.

Sign in to comment.

More Answers (0)

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!