- Collect the data points, which are the blue circles
- Define bins of the data along the x-axis, based on equal-width spacing. (For example, data points with 0 < x < 0.05 is the first bin, 0.05 < x < 0.10 is the second bin, and so on.)
- For each bin, find the data point with the highest y-value. (These are the points labeled with red crosses.)
- Perform a regression on those red-cross data points.
- [Repeat for the lowest points, the yellow crosses.]
I get different regression coefficients, depending on how I do the binning. Which regression should I use?
5 views (last 30 days)
Show older comments
I have some doubts about a linear regression model. To begin with, I have a table with 2 columns and a lot of rows. From this table, column 1 is represented on y axis and column 2 is represented on x axis (see bellow images). Using these data, I created an algorithm which computes the upper and lower values as you can see from the bellow images, and also draws the regression lines. The algorithm has a search range, for example, if I would like to get the lower values from 0.0 to 1.0 (x axis), then I will set this range from 0.0 to 1.0 with a given step of 0.05 in order to get the regression lines. As a result, this will search from 0.00 to 0.05, 0.05 to 0.10 ... 0.95 to 1.00 and draw the bottom line (see bellow images) . My problem is that I can't understand how to choose the right range. Bellow you can see images with different ranges which gives different linear regression equations. First image has a range of 0.0 to 1.0 with 0.05. as a step. Second image has a range of 0.0 to 0.95 with 0.05 and the third image has a range of 0.0 to 0.85 with 0.05 as a step. Should I pick the one with the biggest R-Squared and lowest RMSE?
7 Comments
the cyclist
on 10 Jan 2023
You never really answered the question of what you are trying to do. What is the purpose of doing this regression?
I am not sure why think you need to remove any data. Normally, the purpose of collecting data is to estimate an effect of some kind. Collect the appropriate data, do the regression (which results in an estimate of the parameter), and you are done.
I am very confused about why you are trying to regress the top and bottom. Why are you not just regressing all the data?
Answers (0)
See Also
Categories
Find more on Linear Regression in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!