find the best linear regression model using stepwiselm

Question

0 votes

Aim1Data_CMC5_noRRA_individuals.xlsx

Hi all;

sp I have a table with the response variable bein 'VNAF'. I have 9 other predictors and I'm trying to use stepwiselm to find the best linear regression model according to highest Rsquared. I want to add all the possible interaction terms and also quadratic terms, This is my code:

fileName = 'Aim1Data_CMC5_noRRA_individuals.xlsx';
T = readtable(fileName,'ReadRowNames',true);
mdl = stepwiselm(T,'quadratic','ResponseVar','VNAF','Criterion','rsquared')

This gives me an Rsquared of 0.705.

while the simple linea regression mdoel gives me a R2 of 0.67.

My first question is that if I'm using stepwiselm right? meaning that my code includes all the interaction terms and quadratic terms for all the predictors?

my second question is that how I can improve my Rsquared? Does step function help? if so, what should I specify in it?

Thank you all

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

the cyclist on 10 Jun 2022

0 votes

First, I feel obligated to mention that stepwise regression is often criticized as a technique. This wikipedia article discusses it.

Second: Yes, I think you are calling the function correctly, to get the terms you want to try.

Third: You might want to get a better understanding of the 'Criterion' you are using. If your goal is for this model to generalize beyond your dataset, I think something like AIC might be a better choice. (I am also a little confused why the function did not just return the full model with all possible terms, since that model gives the max R^2. But that is probably just me not researching the documentation fully.)

Last: Empirically, a model with R^2=0.67 and one with R^2=0.71 are so close to the same (in terms of explained variation) that I would consider them to be basically identical. I would not choose a model based on that difference.

Since this is a physical system, I feel like there should be some rationale about why each term might come in a particular power, and that could help drive the decision.

If you are truly going for only predictive power, you might want to use a machine learning model instead ... although with only 181 data points, I'm not entirely sure that that is the best way to go.

1 Comment
Show -1 older comments Hide -1 older comments

azarang asadi on 13 Jun 2022

Thanks for all the info, appreciated, was very helpful

Sign in to comment.

find the best linear regression model using stepwiselm

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

1 Comment
Show -1 older comments Hide -1 older comments

More Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

find the best linear regression model using stepwiselm

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

1 Comment Show -1 older comments Hide -1 older comments

More Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

1 Comment
Show -1 older comments Hide -1 older comments