I get different regression coefficients, depending on how I do the binning. Which regression should I use?

5 views (last 30 days)

Christos Tsallis on 10 Jan 2023

0
Link

Direct link to this question

https://se.mathworks.com/matlabcentral/answers/1891560-i-get-different-regression-coefficients-depending-on-how-i-do-the-binning-which-regression-should

Edited: Christos Tsallis on 10 Jan 2023

I have some doubts about a linear regression model. To begin with, I have a table with 2 columns and a lot of rows. From this table, column 1 is represented on y axis and column 2 is represented on x axis (see bellow images). Using these data, I created an algorithm which computes the upper and lower values as you can see from the bellow images, and also draws the regression lines. The algorithm has a search range, for example, if I would like to get the lower values from 0.0 to 1.0 (x axis), then I will set this range from 0.0 to 1.0 with a given step of 0.05 in order to get the regression lines. As a result, this will search from 0.00 to 0.05, 0.05 to 0.10 ... 0.95 to 1.00 and draw the bottom line (see bellow images) . My problem is that I can't understand how to choose the right range. Bellow you can see images with different ranges which gives different linear regression equations. First image has a range of 0.0 to 1.0 with 0.05. as a step. Second image has a range of 0.0 to 0.95 with 0.05 and the third image has a range of 0.0 to 0.85 with 0.05 as a step. Should I pick the one with the biggest R-Squared and lowest RMSE?

7 Comments
Show 5 older commentsHide 5 older comments

the cyclist on 10 Jan 2023

You never really answered the question of what you are trying to do. What is the purpose of doing this regression?

I am not sure why think you need to remove any data. Normally, the purpose of collecting data is to estimate an effect of some kind. Collect the appropriate data, do the regression (which results in an estimate of the parameter), and you are done.

I am very confused about why you are trying to regress the top and bottom. Why are you not just regressing all the data?

Christos Tsallis on 10 Jan 2023

Edited: Christos Tsallis on 10 Jan 2023

You are right. First of all, axes are labeled as follows. Y axis has Land Surface Temperature (LST) values in Kelvin, and X axis has values of a Normalized Difference Vegetation Index (NDVI). NDVI values range from +1.0 to -1.0. Areas of barren rock, sand, or snow usually show very low NDVI values (for example, 0.1 or less). Sparse vegetation such as shrubs and grasslands or senescing crops may result in moderate NDVI values (approximately 0.2 to 0.5). I am attaching an article bellow, which is the cause of linear regression that I am using. You see, there is a way to find soil moisture using Land Surface Temperature (LST) in Kelvin and NDVI values. The first thing you have to do is to extract NDVI and LST values from a multispectral image. After that, you merge LST and NDVI values to a table and it will have 2 columns (1st column: LST, 2nd column: NDVI), and a lot of rows (rows represents the number of pixels). In order to find soil moisture there are 2 edges that must be computed. The dry edge (upper) and the wet edge (bottom) (see all the above images). I adopted a simple algorithm to search inside a range of x axis, and find the maximum LST values for the dry edge and the minimum LST values for the wet edge. As a result, values which are located at the top and at the bottom represents those edges. Now, linear regression must be applied to upper and bottom edge in order to get the equations of each edge. Upper edge's equation is LSTmax=Ax+b, and bottom edge's equation is LSTmin=Ax+b. To compute soil moisture we apply the following formula SMI=(LSTmax-LST)/(LSTmax-LSTmin), LST is a given matrix with land surface temperature values in Kelvin. My doubt exists because if I apply linear regression to a specific range like 0 to 1 with 0.05 as a step then soil moisture index has some values above 1 and bellow 0 (this is not permitted as SMI range should be 0 to 1). If I reapply linear regression again with a different range like, 0.1 to 0.8 with 0.05 as a step there is a possibility that values above 1 and bellow 0 will be less (or more) than the previous try. That's why I asked. I am trying to find the best way to have values inside the valid range.

Article's Link: Assessment of soil moisture using Landsat ETM+ temperature/vegetation index in semiarid environment | IEEE Conference Publication | IEEE Xplore

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

I get different regression coefficients, depending on how I do the binning. Which regression should I use?

7 Comments
Show 5 older commentsHide 5 older comments

Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

I get different regression coefficients, depending on how I do the binning. Which regression should I use?

7 Comments Show 5 older commentsHide 5 older comments

Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

7 Comments
Show 5 older commentsHide 5 older comments