Fitting data with broekn stick regression

Hi.
I have some grape ripening data (sugar versus heat degrees) and want to try and fit a broken stick regression to it, and am wondering if this can be done in Matlab.
The data is quite simple e.g.
x=[1644 1669 1697 1720 1792 1909];
y=[11.9 11.6 12.6 13.4 13.9 14.6],
Basically grape ripening seems to follow a linear increase until some point and then flattens off on another trajectory. I'm wondering how I might use segmented or broken stick regression in matlab.

Answers (3)

Yes your data is quite simple. In fact TOO simple.
x=[1644 1669 1697 1720 1792 1909];
y=[11.9 11.6 12.6 13.4 13.9 14.6]
plot(x, y, 'bs-', 'LineWidth', 2);
grid on;
title('Sugar versus heat degrees', 'fontSize', fontSize);
ylabel('Sugar', 'fontSize', fontSize);
xlabel('heat degrees', 'fontSize', fontSize);
I don't see how you can expect to get any kind of meaningful regression out of this very tiny simple set of data.
Anyway, why would you want to lines instead of a formula for a more continuous function? What does two linear formulas get you over one single model formula, except for a more complicated model where you have to check your x value first before plugging it into one of two models? Not sure why two formulas are desired/advantageous over one formula. Please explain why.

11 Comments

The numbers of data points varies but at most would be around 12 data points. We have tried curve fitting a single equation such as the Gompertz equation but this hasn't worked with most sets of the data. We know that grapes ripen lineaerly initially and then plateau off and we are trying to establish what the rates of the linear sections are, hence thought a broken stick approach might work. The data points seem irregular because for example a rain event can dilute the grapes and lower the sugar reading until they re-equilibrate. So we are trying to establish a rate of ripening from the initial linear section.
I do not see two linear portions in the data you posted. I see 3. Also, did you mean sugar (y) vs. heat (x) (like you said) or heat (y) vs. sugar (x)?
x axis is heat and y axis is sugar.
This example of data may not fit with 2 sections - I have thousands of these and was wondering if some/any fit. I was hoping for an explanation of how to use segmented regression - even if the previous link just had a few more explanatory notes it might help me. I am very new to matlab
Kind regards, Wendy
Do you think the "kink" or crossing in the linear portions changes with each data set? If so, why would that be? If not, just combine all data sets into one before the fitting. I also do not see how a piecewise linear fit can be said to be better or worse than a Gompert function when you only have a dozen or so data points to try to make that determination with.
You have insufficient data to justify a broken stick regression. Just wanting something is not sufficient. Were you to try to wrap confidence limits around the result, they would be large enough that the results would be of no value.
If you absolutely insist on a broken stick regression, here it is:
plot(x,y,'-')
Simple. Fast. It reproduces the data exactly. What more could you ask for? ;-)
Seriously, this data merits nothing more then a simple linear regression, or at best a simple interpolation (as given by plot).
Your preference.
To answer a couple of questions.
Yes the kink is likely to be different with each plot potentially because they are different grape varieties in different climates and rates of ripening seem to be different. I am just trying to ascertain which is the best method - if indeed there is one - of establishing the speed of ripening for each grape variety. In some cases Gompertz fits well, in some linear fits well, and in others it appears that a broken stick might work, so I am just wanting a basic approach for this to try. The data set I provide might not be the best one to trial this and maybe linear is best for that data set, but it definitely isn't in all cases. Perhaps we could try another data set
x=[1236.3 1333.3 1429 1497.3 1567.8 1608.3 1643.5 1668.8]
y=[8.6 10.4 11.2 12.6 13.2 13.3 13.6 13.8]
Feedback much appreciated
I don't see how this is better.
I don't see two distinct linear segments. If you were to split this data into two separate parts/halves with a different slope on each side, which x value would you place the dividing line?
Unfortunately I think if I choose where the x break is, I introduce a bias and the slope of the first linear section will be altered accordingly. At a guess the x break is around 1500 from what we know of the physiology and we can estimate this from field observations but the exact point is not known.
I think it makes intuitive sense and I don't choose (preselect) where the break is - it is computed to be at the point where the left slope is most different from the right slope. If you think about it, this is intuitively where you'd expect the break point to be. Please look at that answer.
Yes, thank you. I'm working on that now with some of my data. It is intuitively correct an my data is simpler so less iterations to try.
Thanks
To reiterate my comment, you simply do not have sufficient information to estimate a broken stick regression, and certainly not to estimate where the breaks occur.

Sign in to comment.

Image Analyst
Image Analyst on 8 Jun 2018
Edited: Image Analyst on 8 Jun 2018
Here's one way to do it. See attached script for demo with noisy data.
Basically I fit lines to the left side and right side and kept track of the slope differences as I varied the point (x value, index) at which the sides were separated. The point where the slope different is greatest is where the lines are most dissimilar and where the crossing occurs.

2 Comments

Thanks, I just trialled that approach with much simpler data and it seems to work. Also, because this is a biological system, in some weather conditions the sugar increase does not plateau before the fruit is picked so your approach should handle that as well because the lines won't be dissimilar.
Then, is it time to "Accept this answer"? Or is it still not working?

Sign in to comment.

You can have a look at this solution: Broken-Stick Regression

1 Comment

Yes I did see that but I really just can't understand it and can't make it work with my data unfortunately.
Wendy

Sign in to comment.

Categories

Asked:

on 5 Jun 2018

Commented:

on 8 Jun 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!