Hello all,

Let's say for a moment I have some a priori ideas about a family of functions that might best describe a particular data set. After I fit the data with each candidate, I can simply look at the outputs (e.g., RMSE, r²) and chose the one with the best values. However, is there a way to gain some degree of confidence about that selection? For example, let's say I have reason to believe that in a set of data like the below, the true distribution is described by a low-order integer exponent in the function y=b×xⁿ+c. So, I might capture some data and then try to fit it as follows:

rx = randi(100, [100, 1])/10;

rn = 3 + rand([100, 1])/4 - 0.125;

rc = (rand([100, 1]) * 500) - 250 + c;

m = @(b, c, n, x) b * x.^n + c;

m1 = @(p, x) m(p(1), p(2), 1, x);

m2 = @(p, x) m(p(1), p(2), 2, x);

m3 = @(p, x) m(p(1), p(2), 3, x);

c1 = lsqcurvefit(m1, [1, 0], rx, ry)

c2 = lsqcurvefit(m2, [1, 0], rx, ry)

c3 = lsqcurvefit(m3, [1, 0], rx, ry)

x = linspace(min(rx), max(rx));

plot(rx, ry, 'ok', x, m1(c1, x), '-g', x, m2(c2, x), '-r', x, m3(c3, x), '-b')

In the above example, m3 should generally fit the data best, unless the random number generation is very unlucky, because that's what I set it to.

For those who are familiar with Prism, there I might perform this test by starting an "analysis" and chosing compare, and then selecting "for each data set, which of two equations (models) fits best" and then chosing the "Akaike's Information Criterion" test. After running that on one set of data, I got

This is very helpful as it lets me know how certain I am about the fit choice (i.e., 86% sure n=3 is better than n=2). What's the best way to do something similar in MATLAB?

Thanks in advance.