Nonlinear regression of multiple datasets with different functions but wit some shared parameters between datasets

Apologies since I am new to Matlab.
I have a model describing a drug-receptor system with multiple parameters. There are two different observables (binding , activity) for each drug. (I have separate but related equations for predicting the observables). The model has four parameters per drug and four system parameters that are shared for all drugs. I have datasets with a variable number of replicates for each observable (3 binding curves and one response curve per drug). I have data for 12 different drugs. I would appreciate some help setting up global nonlinear regression for all datasets to obtain best estimates of the drug specific and system specific (shared) parameters and their associated 95% confidence intervals. A general example that can handle this type of situation would be helpful to get me started.

17 Comments

I'm going to bring this to the attention of @Arthur Goldsipe. I don't have Simbiology, however it can probably deal with this sort of problem.

Yes, Star Strider is correct that SimBiology has functionality to handle this. But it would require reimplementing your model in SimBiology. It’s specifically designed for modeling drug systems, so you might also get additional benefit from using it. But if you can’t or don’t want to use SimBiology, I can also share some details about how to do this sort of nonlinear regression. Let me know, and I’ll share more tomorrow when I’m back at my desk.

Oh, and confidence interval calculations are also available in SimBiology.

Thank you Star strider and Arthur.
I have looked into Simbiology superficially. My model is an equilibrium model as opposed to a kinetic model.
Although I can implement it as a kinetic model, I would prefer not to use individual rate constants, only equilibrium ones. Is this possible in Simbiology? Also, can simbiology fit equilibrium values as opposed to kinetic data?
Again,my apologies for my lack of familiarity.
SimBiology models can consistent of a combination of differential and algebraic equations. It's not common, but it's certainly possible to create a SimBiology model that only contains algebraic constraints. That's probably what I would do to model equilbrium data in SimBiology.
And fitting (and confidence interval calculations) work for any SimBiology model. Some functionality (mostly local sensitivity analysis) may not be supported for certain kinds of algebraic constraints.
Can you share any more details about your problem? The equations would be most helpful. If you can also share the data (or at least say which of the variables from the equations are in your data sets), that could also help us construct example code to share with you.
Here is a brief description:
The model consists of a receptor that has two separate binding sites. An orthosteric and an allosteric site. I am modeling the ability of an orthosteric agonist to displace a fixed labeled orthosteric antagonist. The receptor can exist in an inactive or active conformation. The system therefore has 12 different states. 6 inactive and 6 active versions of the receptor based on what ligand(s) are bound.
Transitions between states are driven by binding of ligands or isomerization between active and inactive states.
I have derived the algebraic equations that describe the system fully based on the equilibrium binding constants and isomerization constants. There is a total of 12 parameters that I want to estimate based on the data. (4 for each orthosteric ligand (2 orthosteric igands: the agonist in queston and the common labeled antagonist), 2 parameters for the allosteric ligand, and 2 for the receptor and system .
The data is of two types:
a) Labeled ligand displacement curves for various agonists in the absence or presence of various concentrations of the allosteric modulator.
b) Activity data that is a function of the fraction of receptors in the active state.
I have (rather cumbersome) algebraic equations that predict the binding and activity data for each experiment. Some variables (concentration of labeled antagonist and allosteric ligand for example) and parameters are fixed in a given experiment but are different for different experiments.
I would like to fit all the data simultaneously to obtain estimates of the parameters and their 95% confidence intervals
I have already implemented this in excel and can obtain estimates of the parameters using the built-in solver but getting the 95% confidence intervals is not straightforward. (I could implement a bootstrap method in excel, but that would be tedious and error-prone.
I was hoping to leverage Matlab to do this.
Perhaps how to implement a simpler example of ligand binding based on algebraic equations in simbiology would be enough for me to get started.
That is certainly an interesting problem. I understand the physiology, however modeling it would be difficult. I suspect that ligand binding could be modeled as first-order kinetics, however incorporating the alloseric effect of the second ligand and the expression for that interaction on the resulting ligand-receptor activation would be challenging. Since that image is obviously from the published lliterature, it would be helpful to have the mathematical approach those authors used, assuming they attempted to model it.
I do not know if SimBiology could model that, since I have no experience with it.
Thanks!
I generated the model and the figure. It is not published yet, but hopefully soon.
I'm most familiar with kinetic models of ligand-receptor binding. Can you provide the equations for a simpler ligand-receptor system that illustrates the kind of algebraic equations you are solving? Or can you share a reference to a similar sort of model? Then, I should be able to create a more realistic SimBiology project that illustrates how to build the model, fit parameters, and calculate confidence intervals.
Sure:
A very simple idealized example:
The reversible interaction of drug D with receptor R can be described by the equilibrium dissociation constant Kd.
The amount of binding at any concentration of D can be described by:
Given a dataset of X being the concentration of drug D and Y being the corresponding values for binding, find estimates of Kd and Bmax and their 95 % confidence intervals.
Binding Replicate
1 2 3
D (M)
1.00E-10 17 22 19
3.16E-10 50 53 64
1.00E-09 146 146 123
3.16E-09 301 276 328
1.00E-08 329 487 412
3.16E-08 372 416 552
1.00E-07 566 527 493
3.16E-07 414 451 481
The goal is to obtain estimates of Kd and Bmax with 95% confidence intervals.
Try "fitnlm".
But your model function cannot attain a maximum as your data do (in D = 1e-7). So I guess it is not adequate for your process.
It seems that simbiology is not well suited for this type of problem. I will focus on fitnlm and lsqcurvefit.
I appreciate your help.
Anybody knows why p is multiplied by 1/100 in "format long" at the end of the code ? Is this an output issue or a bug ?
M = [1.00E-10 17 22 19
3.16E-10 50 53 64
1.00E-09 146 146 123
3.16E-09 301 276 328
1.00E-08 329 487 412
3.16E-08 372 416 552
1.00E-07 566 527 493
3.16E-07 414 451 481 ];
D = M(:,1);
DD = [];
for i = 1:size(D,1)
DD = [DD;[M(i,1);M(i,1);M(i,1)]];
end
D = DD;
B = M(:,2:4);
BB = [];
for i = 1:size(B,1)
BB = [BB;[B(i,1);B(i,2);B(i,3)]];
end
B = BB;
A = [D,-B];
b = B.*D;
p = A\b;
B_max = p(1)
B_max = 445.9223
K_d = p(2)
K_d = -5.1260e-09
fun = @(p)p(1)*D./(D+p(2))-B;
p = lsqnonlin(fun,p);
Local minimum possible. lsqnonlin stopped because the size of the current step is less than the value of the step size tolerance.
p
p = 2×1
445.9223 -0.0000
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
format long
p
p = 2×1
4.459223081966274 -0.000000000051260
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
It seems that simbiology is not well suited for this type of problem.
As pointed out already: I think the problem is different - the model is not well-suited for your data.
@Torsten the issue with long format looks like it might be specific to the display here in MATLAB Anwers. I'll report this to the appropriate developers.
@Jorge thanks for sharing the sample equation and data. It looks like this kind of probelm should be quite easy to implement in SimBiology. It will take me some time to create an example, due to other work obligations. But I hope to share an example in the next day or so.

Sign in to comment.

 Accepted Answer

You would probably need to use lsqcurvefit. There is nothing about having multiple data sets, following different functions, that changes the approach to its use. The form of their dependence on the unknown variables and how they are shared is also irrelevant. You just provide a model function F(x,xdata) like in the examples which returns a matrix of your predicted values. The only requirement is that the matrix of predicted values that F(x,xdata) generates has the same size as your matrix of measurements ydata..
Confidence interval post-analysis is not something Matlab has any direct tools for when it comes to regression of vector-valued measurements. However, lsqcurvefit will give you the Jacobian at the solution,
so you can implement confidence interval calculations based on that.

More Answers (1)

I've attached a SimBiology project (created in R2024b) that illustrates how to set up this sort of problem in SimBiology. If you're not familiar with MATLAB, then it will probably be easiest to build the model in the SimBiology Model Builder app and then fit the data in the SimBiology Model Analyzer app and then calculate 95% Gaussian confidence intervals on parameter estimates and predictions. Just open this project file in the apps to view the model and the fit program.
But since it's easier to illustrate the capabilities here via the command line, here's how you could do the work programmatically.
If you don't want to use the same parameter estimate for each dose, then you just need to add a variable to your data that says which observations use the same esimate, and then update CategoryVariableName to that variable. For example, setting the value to D would result in a different parameter estimate for each unique value of D.
Construct the model
model = sbiomodel("Binding");
addspecies(model, "D", Constant=true);
addparameter(model, "Bmax", 1);
addparameter(model, "Kd", 1);
addspecies(model, "B");
addrule(model, "B - Bmax*D/(D+Kd)", "algebraic");
Construct the data
The data needs to be reformatted for us in SimBiology:
  • Add a time variable (the value is arbitrary, since there are no dynamics).
  • Add a grouping variable that gives a unique identifier to each distinct condition we need to simulate (in this case, each dose amount).
  • Stack all measurements of the same state (B) into a single column.
rawDataMatrix = [
1.00E-10 17 22 19
3.16E-10 50 53 64
1.00E-09 146 146 123
3.16E-09 301 276 328
1.00E-08 329 487 412
3.16E-08 372 416 552
1.00E-07 566 527 493
3.16E-07 414 451 481 ];
tableData = array2table(rawDataMatrix, VariableNames=["D", "B1", "B2", "B3"]);
tableData.Time(:) = 1;
tableData.ID = (1:height(rawDataMatrix))';
tableData = stack(tableData, ["B1", "B2", "B3"], ...
NewDataVariableName="B", IndexVariableName="ReplicateID")
tableData = 24×5 table
D Time ID ReplicateID B ________ ____ __ ___________ ___ 1e-10 1 1 B1 17 1e-10 1 1 B2 22 1e-10 1 1 B3 19 3.16e-10 1 2 B1 50 3.16e-10 1 2 B2 53 3.16e-10 1 2 B3 64 1e-09 1 3 B1 146 1e-09 1 3 B2 146 1e-09 1 3 B3 123 3.16e-09 1 4 B1 301 3.16e-09 1 4 B2 276 3.16e-09 1 4 B3 328 1e-08 1 5 B1 329 1e-08 1 5 B2 487 1e-08 1 5 B3 412 3.16e-08 1 6 B1 372
Set up and run the fit problem
prob = fitproblem;
prob.Data = tableData;
prob.Model = model;
prob.Estimated = estimatedInfo( ...
["log(Kd)", "log(Bmax)"], ...
Bounds=[1e-20 0.1; 0.1 1e3], ...
CategoryVariableName=["<None>", "<None>"]);
prob.ResponseMap = "B = B";
prob.Variants = createVariants(prob.Data, "D", Model=model);
prob.ProgressPlot = true;
prob.FunctionName = "scattersearch";
results = fit(prob);
plot(results);
results.ParameterEstimates
ans = 16×7 table
Name Estimate StandardError Bounds Group CategoryVariableName CategoryValue ________ __________ _____________ ______________ _____ ____________________ _____________ {'Kd' } 2.2211e-09 3.9834e-10 1e-20 0.1 1 {'<None>'} <None> {'Bmax'} 492.58 16.912 0.1 1000 1 {'<None>'} <None> {'Kd' } 2.2211e-09 3.9834e-10 1e-20 0.1 2 {'<None>'} <None> {'Bmax'} 492.58 16.912 0.1 1000 2 {'<None>'} <None> {'Kd' } 2.2211e-09 3.9834e-10 1e-20 0.1 3 {'<None>'} <None> {'Bmax'} 492.58 16.912 0.1 1000 3 {'<None>'} <None> {'Kd' } 2.2211e-09 3.9834e-10 1e-20 0.1 4 {'<None>'} <None> {'Bmax'} 492.58 16.912 0.1 1000 4 {'<None>'} <None> {'Kd' } 2.2211e-09 3.9834e-10 1e-20 0.1 5 {'<None>'} <None> {'Bmax'} 492.58 16.912 0.1 1000 5 {'<None>'} <None> {'Kd' } 2.2211e-09 3.9834e-10 1e-20 0.1 6 {'<None>'} <None> {'Bmax'} 492.58 16.912 0.1 1000 6 {'<None>'} <None> {'Kd' } 2.2211e-09 3.9834e-10 1e-20 0.1 7 {'<None>'} <None> {'Bmax'} 492.58 16.912 0.1 1000 7 {'<None>'} <None> {'Kd' } 2.2211e-09 3.9834e-10 1e-20 0.1 8 {'<None>'} <None> {'Bmax'} 492.58 16.912 0.1 1000 8 {'<None>'} <None>
Calculate confidence intervals
ciParam = sbioparameterci(results);
plot(ciParam);
ciPred = sbiopredictionci(results);
plot(ciPred);

1 Comment

Fantastic!
Thank you very much! I see now how to proceed.
I will have to learn a bit more about Simbiology plotting otions to visualize the data, but this is what I needed.
Thank you!

Sign in to comment.

Products

Release

R2024b

Asked:

on 11 Jun 2025

Commented:

on 17 Jun 2025

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!