# How can I produce equation from data set and optimize the equation

43 views (last 30 days)
Gokhan Calis on 10 Jul 2021
Edited: ANKUR KUMAR on 11 Jul 2021
Dear All,
I have a data set that contains no of 5x340 cells. In this set 4 colunms are the ingredients of a material. The fifth colum represents the result of mechanical test. My questions are;
1. How can I produce equation from these data set?
2. Which algorithm can I use to optimize the test results?

ANKUR KUMAR on 11 Jul 2021
Edited: ANKUR KUMAR on 11 Jul 2021
In order to produce equations for the datasets,, you can use fitnlm fucntion to do that. You will get an equation from that.
Since I do not have the data, I am using random dataset.
random_data=randi(1000,50,5); % Generating random data
variables=random_data(:,1:4); % In this set 4 colunms are the ingredients of a material.
output_value=random_data(:,5); % The fifth colum represents the result of mechanical test
modelfun = @(b,x) b(1)*x(:,1)+ b(2)*x(:,2)+ b(3)*x(:,3)+ b(4)*x(:,4) ;
beta = [-5 -5 -5 -5];
mdl = fitnlm(variables,output_value,modelfun,beta);
In order to optimize the equation, calcualate some statistics like correlation to check the error in estimation. In order to minimize the error, you need to play with the beta variable (initial points). You can put the whole code in loop so that beta values keep on changing, and you can store the statistics (like correlation and RMSE) to see which combination of beta leads to the maximum correlation and minimum RMSE.
% coefficients of the equation
coefficients=mdl.Coefficients.Estimate
coefficients = 4×1
0.4087 0.0395 0.4047 0.1208
coeff_err=mdl.Coefficients.SE;
Variables_covar=mdl.CoefficientCovariance(:,1);
new_value=variables*coefficients;
variance=[var(output_value,'omitnan') var(new_value,'omitnan')];
scatter(output_value,new_value)
xlabel('Observation values')
ylabel('Estimated values')
correlation=corr(output_value,new_value)
correlation = -0.0582
mean_error=mean(output_value-new_value, 'omitnan') % this is not RMSE, this is just the mean of error
mean_error = 54.4590