133 views (last 30 days)

Show older comments

Hi all,

I try to fit the attached data in the Excel spreadsheet to the following power law expression using the least square method.

I aim to obtain a, m and n.

In the attached Excel datasheet I have y, T and P.

y = a * ( (T/298) ^m ) * ( (P/1.2) ^n )

I have double checked the forum to see any similar questions, but the one I found was for different power law expression, in which I couldn't implement my own equation into it.

I wondered if you could suggest me any approach in MATLAB.

Thank you for your time.

Regards

the cyclist
on 12 Feb 2021

Here is code that will do the fit in the original space, without applying the log transform first:

% Load the data

data = readtable("power_law_using_least_square.xlsx");

% Strip off problematic first data point

data(1,:) = [];

% Define convenient variables

Y = data.Y;

% Put explanatory variables into an array

X = [data.T/298 data.P/1.2];

% Define function that will be used to fit data

% (F is a vector of fitting parameters)

f = @(F,x) F(1) .* x(:,1).^F(2) .* x(:,2).^F(3);

beta0 = [1 1 1];

mdl_original = fitnlm(X,Y,f,beta0)

% Plot actual versus predicted Y in original space

figure

plot(Y,predict(mdl_original,X),'o')

line([0 90],[0 90],'LineStyle','--','LineWidth',1,'Color','k')

axis equal

title('Fit in original space')

xlabel('Y from data')

ylabel('Y from fit')

the cyclist
on 11 Feb 2021

Edited: the cyclist
on 12 Feb 2021

I would solve this by first applying a natural log transform to both sides of the equation, which then gives you a linear equation.

% Original equation

% y = a * ( (T/298) ^m ) * ( (P/1.2) ^n )

% Transform by taking the natural log of both sides

% log(y) = log(a) + m * log(T/298) + n * log(P/1.2);

% Load the data

data = readtable("power_law_using_least_square.xlsx");

% Strip off problematic first data point

data(1,:) = [];

% Define convenient variables

logY = log(data.Y);

t = log(data.T/298);

p = log(data.P/1.2);

% Fit the transformed equation

mdl = fitlm([t p],logY)

% Plot actual versus predicted Y in log space

figure

plot(logY,predict(mdl,[t,p]),'o')

line([0 5],[0 5],'LineStyle','--','LineWidth',1,'Color','k')

axis equal

title('Fit in log space')

xlabel('log Y from data')

ylabel('log Y from fit')

% Plot actual versus predicted Y in linear space

figure

plot(exp(logY),exp(predict(mdl,[t,p])),'o')

line([0 90],[0 90],'LineStyle','--','LineWidth',1,'Color','k')

axis equal

title('Fit in original space')

xlabel('Y from data')

ylabel('Y from fit')

If you examine the resulting plots, you will see that the fit is not very good at all. I have not investigated whether this is because I made some mistake in the code, or if your model is a poor one for your data. But I figured I would post this regardless. Maybe it will help you.

Oh, also. I removed the data point for which Y == 0. Your model definitely can't fit that. I have no advice there.

John D'Errico
on 12 Feb 2021

Edited: John D'Errico
on 12 Feb 2021

Let me look more carefully at your data, as there are some thigns that REALLY bother me about the approach you wish to use, as well as your data.

TYP = [298 0 0.986923009833903

300.101235002186 43.1990254126569 1.01209349427671

301.908520257823 65.0218477665674 1.03411766816416

305.935528457621 50.083956213335 1.08445863704977

310.780047854022 48.5052713557729 1.14738484815678

317.018449674478 47.315470757813 1.23233523315124

324.648707751109 47.4190220529795 1.34245610258851

333.805405104856 49.5689966428311 1.48404007757928

344.874810493167 46.6742563932582 1.66967240034496

356.711851888339 49.793300320307 1.88676782866414

370.55927175896 50.0405809408228 2.16678946809033

385.660757806138 50.7992995854221 2.50659100806819

402.028726805153 51.0012599219203 2.9187576908191

419.372189798347 53.6314606614739 3.40958213745378

438.366194499516 53.6258762011997 4.01682007463642

458.068137769226 49.7585908621274 4.73103257070098

476.97709145931 36.2615978957431 5.5050249673172

491.282838125128 29.0057706899755 6.1531649417194

502.975369275774 17.5628384000534 6.72579346279318

510.206038314783 13.8072020943403 7.10020441887989

515.939729045497 9.79482823125674 7.40854285330424

520.03408770692 7.04103773327845 7.63507721328947

522.987452750413 4.1988379264908 7.80183167272305

524.746678534683 0.199850325276192 7.90251361049426

524.801368151412 0.199744549683599 7.90565992104962];

T = TYP(:,1);

Y = TYP(:,2);

P = TYP(:,3);

plot3(T,P,Y,'o')

view(-6,21)

box on

grid on

xlabel T

ylabel P

zlabel Y

UGH. This is not good data. You need to recognize that there appears to be a t least two points that are wholly inconsistent with the rest. At the very least, that will blow any such regression modeling out of the water.

Look at the plot! If I have said anything most frequently on this forum, it is to plot EVERYTHING! Then if you get bored, plot something else. Keep on plotting until you find somethign valuable. And think about EVERY plot. What does it tell you?

When I look at that figure, I see two points that apprear to be entirely inconsistent. One of them is the first data point, with Y==0. Then there is the third data point in your list, where y is 65. Again, this is entirely inconsistent with the others wher P and T are nearly the same sa at some other data points, yet Y is something completely different from the rest. Do you understand that this will blow ANY regression model out of the water?

So I would just exclude points 1 and 3. Even the second data point seems a bit strange, but leave it in for now.

Keeplist = setdiff(1:25,[1 3]);

plot(T(Keeplist),Y(Keeplist),'o')

plot(P(Keeplist),Y(Keeplist),'o')

Sadly, plotting the marginals, neither plot looks like any sort of power law behavior. And worse, T and P are highly correlated themselves. So my question becomes, why in the nae of god and little green apples do you think this data supports that model? My guess is you have no clue as to what model may be appropriate, so you decided to try a powerlaw fit.

Lets think about the model you chose. It was:

y = a * ( (T/298) ^m ) * ( (P/1.2) ^n )

What happens when T is zero? When P is zero? In both cases, Y must be exactly zero. So my first guess is that you created the first data point arbitrarily, hoping to force the curve through zero. That first point suspiciously has y EXACTLY at zero. But your data does NOT support anything of the sort!!!!!!!!!!

In fact, it looks like your data seems to indicate that as P and T are both small, the model should predict Y is roughly 50, maybe just a little less.

And that suggests your model might have a CHANCE in hell of being something like:

Y = c + a * ( (T/298) ^m ) * ( (P/1.2) ^n )

Here c will be roughly 50 when we estimate it. If we do that, when either P or T approaches zero, the model will have some chance of predicting the data.

So next, what might be reasonable values for n and m? This is a serious problem, because P and T are HIGHLY correlated. And the fact is, this latter model will still not fit well at all.

Again, the marginal plots I showed are simply not anything that looks remotely like any power law. You have a relationship that is virtually CONSTANT for a long way but at a specific point, it seems to drop off almost linearly. And this is completely inconsistent with any powerlaw behavior!

It is my STRONG recommendation that you reconsider what model you want to use here. The model should be based on reality, on physical principles if possible. Lacking that, you need to consider what you want to do with this model. Why are you trying to fit it? What will you do with that model?

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!