How to estimate Y using a linear regression and an uncertain X value.
38 views (last 30 days)
Show older comments
I have a training dataset that I'd like to perform regression upon so that I can use the resulting regression to estimate values of Y for other samples for which I only have X. This is standard, using polyfit and polyval, however I have uncertainties in my new X value, and polyval doesn't seem to be able to handle uncertainties in X.
More detail: In my training set I have 100 paired observations of X and Y (in my case, these are observations of Aluminium content and Calcium content of 100 rock samples). These samples don't have any uncertainty, for ease. A linear regression with uncertainty is simple enough:
%Synthetic data for example
X_data = 1:0.25:25;
Y_data = -0.3*X_data + 2*randn(1,100);
%Create linear regression on data
[p, S] = polyfit(X_data, Y_data, 1);
If I now wanted to estimate the Y value in a new sample, if I had it's X value, I would use polyval. However, my new X value (Xnew) has uncertainty, such that Xnew has a mean of 12 and an stddev of 3. I cannot see any way to incorporate the Xnew uncertainty into polyval.
How can I evaluate Y whilst accounting for both the uncertainty in the linear regression and the uncertainty in my new X value?
0 Comments
Answers (2)
Star Strider
on 10 Nov 2023
If I understand corectly what yoiu want to do, the best approach might be to use fitlm and then use the appropriate predict function (I beleive that is the correct function, however MATLAB will automatically choose the correct one, so no worries) since it will incorporate the uncertainties in the parameters and produce a ‘Y’ value and an associated confidence interval for each value of ‘X’ that you provide. (There are ways to calculate the parameter uncertainties for polyfit — I wrote one and need to post the update — as well as a way of calculating the prediction confidence intervals, one of which I also wrote and does not need to be updated.) Those will work, however fitlm and its predict function are easier in this instance.
As for accounting for the uncertainties in your new ‘X’ value, I would just use a range of each of the ‘X’ values (either the ±95% coinfidence interval extremes as well as the mean, or just the extremes, or something else that makes sense in the context of your calculations). Use those as your arguments to predict.
3 Comments
Star Strider
on 18 Nov 2023
That seems similar to the bootstrp approach. If you have the Statistics and Machine Learning Toolbox, you might consider it. Otherwise, polyfit can return a covariance matrix as one of its ‘S’ outputs, although I am not certain how that fits with what you want to do.
the cyclist
on 11 Nov 2023
This is not exactly what you have asked for, but you could also look into doing the first regression using an errors-in-variables model, e.g. total least squares, which will take into account uncertainty in both explanatory and response variables.
If your real application has just one explanatory variable, then this reduces to a Deming regression. To my knowledge, there is no native MATLAB implementation of this model, but there is at least one user-contributed File Exchange submission, e.g. this one.
0 Comments
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!