This m-file returns a useful residual scaling, the prediction error sum of squares (PRESS). To calculate PRESS, select an observation i. Fit the regression model to the remaining n-1 observations and use this equation to predict the withheld observation y_i. Denoting this predicted value by ye_(i), we may find the prediction error for point i as e_(i)=y_i - ye_(i). The prediction error is often called the ith PRESS residual. This procedure is repeated for each observation i = 1,2,...,n, producing a set of n PRESS residuals e_(1),e_(2),...,e_(n). Then the PRESS statistic is defined as the sum of squares of the n PRESS residuals as in,

PRESS = i_Sum_n e_(i)^2 = i_Sum_n [y_i - ye_(i)]^2

Thus PRESS uses such possible subset of n-1 observations as an estimation data set, and every observation in turn is used to form a prediction data set. In the construction of this m-file, we use this statistical approach.

As we have seen that calculating PRESS requires fitting n different regressions, also it is possible to calculate it from the results of a single least squares fit to all n observations. It turns out that the ith PRESS residual is,

e_(i) = e_i/(1 - h_ii)

Thus, because PRESS is just the sum of the squares of the PRESS residuals, a simple computing formula is

PRESS = i_Sum_n [e_i/(1 - h_ii)]^2

It is easy to see that the PRESS residual is just the ordinary residual weighted according to the diagonal elements of the hat matrix h_ii. Also, for all the interested people, here we just indicate, in an inactive form, this statistical approaching.

Data points for which h_ii are large will have large PRESS residuals. These observations will generally be high influence points. Generally, a large difference between the ordinary residual and the PRESS residual will indicate a point where the model fits the data well, but a model built without that point predicts poorly.

This is also known as leave-one-out cross-validation (LOOCV) in linear models as a measure of the accuracy. [an anon's suggestion]

In order to improve the matrix script for avoiding the squares condition number in the regression parameter estimation are by using a pivoted QR factorization of X.

Syntax: function x = press(D)

Inputs:

D - matrix data (=[X Y]) (last column must be the Y-dependent variable).

(X-independent variables).

Output:

x - prediction error sum of squares (PRESS).

Antonio Trujillo-Ortiz (2021). press (https://www.mathworks.com/matlabcentral/fileexchange/14564-press), MATLAB Central File Exchange. Retrieved .

Created with
R14

Compatible with any release

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!Create scripts with code, output, and formatted text in a single executable document.

Koray KADASgiuseppeHi. I tried to use a RBF Network to fit my data. In this case, should the Design Matrix H my input data?

I mean: D = [H;y]?

Thanks

Antonio Trujillo-Ortizannon. Yes, it is.

AnonRefers PRESS to the well-known leave-one-out-crossvalidation? If yes, it would be good to indicate so in the description.

John D'ErricoThanks for fixing it. I'm happy now.

Antonio Trujillo-OrtizThere were already done to this m-file the valuable suggestions done by Bart and John D'Errico. Thx

Antonio Trujillo-OrtizHi John,

Thanks for your broad and useful mathematical fundament comment. I'll review the file script and take into account those in order to improve it.

Antonio Trujillo-Ortiz

John D'ErricoBart - Worse, this uses a poor form to solve the least squares problem. Here is the line in question in press:

b = inv(X'*X)*(X'*Y); %least squares parameters estimation (=X\Y)

This is arguably bad because it uses inv, but FAR worse, it uses the normal equations. That is the WRONG way to solve the least squares problem.

Faster, more efficient, and more accurate is:

b = X\Y;

This works for non-square matrices X. Why is it better? Because the form in press forms X'*X, which squares the condition number of the problem. So even moderately poorly conditioned problems become vastly more ill-conditioned. You can lose a lot of accuracy using that form.

Instead, the X\Y form uses a pivoted QR factorization of X, which avoids squaring the condition number.

As far as the other lines in question, they too can be greatly improved. For example, this line:

H = X*inv(X'*X)*X'; %hat matrix

Again, this forms X'*X, then calls inv. A better solution might be to compute the QR factors:

[Q,R] = qr(X);

We know that X = Q*R. Therefore,

X'*X = R'*Q'*Q*R

But Q is orthogonal, so Q'*Q is an identity matrix. So we have that

X'*X = R'*R

As importantly,

inv(X'*X) = inv(R'*R)

If the matrix X'*X is invertible, then R is better conditioned, and more easily inverted. So one would be better off forming inv(X'*X) as inv(R)*inv(R)':

k = size(X,2);

r = inv(R(1:k,:));

r*r'

While this does indeed use a matrix inverse, at least it is throwing inv at the unsquared matrix.

Even better is to use the column pivoted QR here. So

[Q,R,E] = qr(X);

We can use that factorization to give both the least squares solution in lieu of \ as well as the hat matrix.

From the help to QR:

If A is full:

[Q,R,E] = qr(A) produces unitary Q, upper triangular R and a

permutation matrix E so that A*E = Q*R. The column permutation E is

chosen so that ABS(DIAG(R)) is decreasing.

So we know that

X*E = Q*R

E is a permutation matrix, so we have that E*E' is an identity. Therefore,

X = Q*R*E'

So the least squares solution can be gained from the system of equations:

Q*R*E'*b = Y

R*E'*b = Q'*Y

E'*b = R\(Q'*Y)

Finally,

b = E*(R\(Q'*Y));

Note that since R is upper triangular, MATLAB will be smart about the use of \ to do a back-substitution to solve the system.

The hat matrix also falls out nicely enough from that pivoted QR, where all can be done efficiently and using well-posed numerical operations.

MatlabUserGiving this a little more thought, I realized that the use of (pseudo) inverse for multiplication is best avoided, both in press() and in general. To quote the R2013a help:

"A frequent misuse of inv arises when solving the system of linear equations

Ax = b. One way to solve this is with

x = inv(A)*b. A better way, from both an execution time and numerical accuracy standpoint, is to use the matrix division operator x = A\b. This produces the solution using Gaussian elimination, without forming the inverse."

MatlabUserThank you for this file. Very helpful.

I use it in slightly modified form, as I think the original here has some room for improvement:

1) The line: ye = [1 D(i,1:2)]*b;

can be changed into: ye = [1 D(i,1:c-1)]*b;

This will make the function work for X with more than two columns.

2) The line: H = X*inv(X'*X)*X'; %hat matrix

This is problematic because of inv(). Current form may match textbook form most closely, but it gives unexpected results. I have had cases where the 'loop method' result is orders of magnitude different from the 'matrix method' result. Chanching inv() to pinv() may be an improvement. Better still may be to incorporate the work of Prof. Tim Davis: http://www.mathworks.nl/matlabcentral/fileexchange/24119

Mind you, I am no expert on matrix operations. I just noticed that multiplying by inv() is not very 'robust'.

LeidyThanks

David HerreraCorrection: My computation of PRESS was wrong, and Mr. Antonio Trujillo-Ortiz was absolutely correct in the calculation of PRESS from his code. The problem was that I forgot to place a minus in front of one of the numbers, so I got a slightly different answer. Independently with different code, I got the same answer as Antonio Trujillo-Ortiz, which was 22225.0575. I apologize for the mistake.

David HerreraThe correct answer is 22225.0, which is correctly stated in the comment section of this code. However, the code does not compute this. Revision needed to compute PRESS value.

David HerreraThis code does not give the correct answer to the problem stated in Response Surface Methodology by Montgomery & Myers, page 48, which is 22,225. This code gives 21265.2514 by the two different methods in the code. It needs revision. See output below (some variables were printed):

Calculation of PRESS

n = 14

r = 13, c = 3

r = 13, c = 3

r = 13, c = 3

r = 13, c = 3

r = 13, c = 3

r = 13, c = 3

r = 13, c = 3

r = 13, c = 3

r = 13, c = 3

r = 13, c = 3

r = 13, c = 3

r = 13, c = 3

r = 13, c = 3

r = 13, c = 3

PRESS = 21265.2514

hii = 0.358818

hii = 0.379259

hii = 0.323410

hii = 0.294624

hii = 0.094723

hii = 0.137715

hii = 0.142682

hii = 0.242150

hii = 0.236885

hii = 0.202703

hii = 0.209627

hii = 0.073729

hii = 0.225672

hii = 0.078004

2nd PRESS = 21265.2514

>>