Nonlinear regression of multiple datasets with different functions but wit some shared parameters between datasets

Question

0 votes

Apologies since I am new to Matlab.

I have a model describing a drug-receptor system with multiple parameters. There are two different observables (binding , activity) for each drug. (I have separate but related equations for predicting the observables). The model has four parameters per drug and four system parameters that are shared for all drugs. I have datasets with a variable number of replicates for each observable (3 binding curves and one response curve per drug). I have data for 12 different drugs. I would appreciate some help setting up global nonlinear regression for all datasets to obtain best estimates of the drug specific and system specific (shared) parameters and their associated 95% confidence intervals. A general example that can handle this type of situation would be helpful to get me started.

17 Comments
Show 15 older comments Hide 15 older comments

Jorge on 12 Jun 2025

Here is a brief description:

The model consists of a receptor that has two separate binding sites. An orthosteric and an allosteric site. I am modeling the ability of an orthosteric agonist to displace a fixed labeled orthosteric antagonist. The receptor can exist in an inactive or active conformation. The system therefore has 12 different states. 6 inactive and 6 active versions of the receptor based on what ligand(s) are bound.

Transitions between states are driven by binding of ligands or isomerization between active and inactive states.

I have derived the algebraic equations that describe the system fully based on the equilibrium binding constants and isomerization constants. There is a total of 12 parameters that I want to estimate based on the data. (4 for each orthosteric ligand (2 orthosteric igands: the agonist in queston and the common labeled antagonist), 2 parameters for the allosteric ligand, and 2 for the receptor and system .

The data is of two types:

a) Labeled ligand displacement curves for various agonists in the absence or presence of various concentrations of the allosteric modulator.

b) Activity data that is a function of the fraction of receptors in the active state.

I have (rather cumbersome) algebraic equations that predict the binding and activity data for each experiment. Some variables (concentration of labeled antagonist and allosteric ligand for example) and parameters are fixed in a given experiment but are different for different experiments.

I would like to fit all the data simultaneously to obtain estimates of the parameters and their 95% confidence intervals

I have already implemented this in excel and can obtain estimates of the parameters using the built-in solver but getting the 95% confidence intervals is not straightforward. (I could implement a bootstrap method in excel, but that would be tedious and error-prone.

I was hoping to leverage Matlab to do this.

Torsten on 16 Jun 2025

Edited: Torsten on 16 Jun 2025

Open in MATLAB Online

Anybody knows why p is multiplied by 1/100 in "format long" at the end of the code ? Is this an output issue or a bug ?

M = [1.00E-10     17    22   19
     3.16E-10     50    53   64
     1.00E-09     146  146  123
     3.16E-09     301  276  328
     1.00E-08     329  487  412
     3.16E-08     372  416  552
     1.00E-07     566  527  493
     3.16E-07     414  451  481 ];
D = M(:,1);
DD = [];
for i = 1:size(D,1)
  DD = [DD;[M(i,1);M(i,1);M(i,1)]];
end
D = DD;
B = M(:,2:4);
BB = [];
for i = 1:size(B,1)
  BB = [BB;[B(i,1);B(i,2);B(i,3)]];
end
B = BB;
A = [D,-B];
b = B.*D;
p = A\b;
B_max = p(1)
B_max = 445.9223
K_d = p(2)
K_d = -5.1260e-09
fun = @(p)p(1)*D./(D+p(2))-B;
p = lsqnonlin(fun,p);
Local minimum possible.
lsqnonlin stopped because the size of the current step is less than
the value of the step size tolerance.
p
p = 2×1
  445.9223
   -0.0000
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
format long
p
p = 2×1
   4.459223081966274
  -0.000000000051260
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

Torsten on 16 Jun 2025

It seems that simbiology is not well suited for this type of problem.

As pointed out already: I think the problem is different - the model is not well-suited for your data.

Arthur Goldsipe on 16 Jun 2025

@Torsten the issue with long format looks like it might be specific to the display here in MATLAB Anwers. I'll report this to the appropriate developers.

@Jorge thanks for sharing the sample equation and data. It looks like this kind of probelm should be quite easy to implement in SimBiology. It will take me some time to create an example, due to other work obligations. But I hope to share an example in the next day or so.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Matt J on 11 Jun 2025

Edited: Matt J on 11 Jun 2025

0 votes

You would probably need to use lsqcurvefit. There is nothing about having multiple data sets, following different functions, that changes the approach to its use. The form of their dependence on the unknown variables and how they are shared is also irrelevant. You just provide a model function F(x,xdata) like in the examples which returns a matrix of your predicted values. The only requirement is that the matrix of predicted values that F(x,xdata) generates has the same size as your matrix of measurements ydata..

Confidence interval post-analysis is not something Matlab has any direct tools for when it comes to regression of vector-valued measurements. However, lsqcurvefit will give you the Jacobian at the solution,

[x,resnorm,residual,exitflag,output,lambda,jacobian] = lsqcurvefit(___)

so you can implement confidence interval calculations based on that.

3 Comments
Show 1 older comment Hide 1 older comment

Matt J on 12 Jun 2025

Yes, I gave a link to the doc examples. Here it is again: examples

Jorge on 12 Jun 2025

@Matt J: Thank you!

Sign in to comment.

Answer 2

Arthur Goldsipe on 16 Jun 2025

Edited: Arthur Goldsipe on 16 Jun 2025

Open in MATLAB Online

3 votes

binding.sbproj

I've attached a SimBiology project (created in R2024b) that illustrates how to set up this sort of problem in SimBiology. If you're not familiar with MATLAB, then it will probably be easiest to build the model in the SimBiology Model Builder app and then fit the data in the SimBiology Model Analyzer app and then calculate 95% Gaussian confidence intervals on parameter estimates and predictions. Just open this project file in the apps to view the model and the fit program.

But since it's easier to illustrate the capabilities here via the command line, here's how you could do the work programmatically.

If you don't want to use the same parameter estimate for each dose, then you just need to add a variable to your data that says which observations use the same esimate, and then update CategoryVariableName to that variable. For example, setting the value to D would result in a different parameter estimate for each unique value of D.

Construct the model

model = sbiomodel("Binding");
addspecies(model, "D", Constant=true);
addparameter(model, "Bmax", 1);
addparameter(model, "Kd", 1);
addspecies(model, "B");
addrule(model, "B - Bmax*D/(D+Kd)", "algebraic");

Construct the data

The data needs to be reformatted for us in SimBiology:

Add a time variable (the value is arbitrary, since there are no dynamics).
Add a grouping variable that gives a unique identifier to each distinct condition we need to simulate (in this case, each dose amount).
Stack all measurements of the same state (B) into a single column.

rawDataMatrix = [
     1.00E-10     17    22   19
     3.16E-10     50    53   64
     1.00E-09     146  146  123
     3.16E-09     301  276  328
     1.00E-08     329  487  412
     3.16E-08     372  416  552
     1.00E-07     566  527  493
     3.16E-07     414  451  481 ];
tableData = array2table(rawDataMatrix, VariableNames=["D", "B1", "B2", "B3"]);
tableData.Time(:) = 1;
tableData.ID = (1:height(rawDataMatrix))';
tableData = stack(tableData, ["B1", "B2", "B3"], ...
    NewDataVariableName="B", IndexVariableName="ReplicateID")
tableData = 24×5 table
       D        Time    ID    ReplicateID     B 
    ________    ____    __    ___________    ___

       1e-10     1      1         B1          17
       1e-10     1      1         B2          22
       1e-10     1      1         B3          19
    3.16e-10     1      2         B1          50
    3.16e-10     1      2         B2          53
    3.16e-10     1      2         B3          64
       1e-09     1      3         B1         146
       1e-09     1      3         B2         146
       1e-09     1      3         B3         123
    3.16e-09     1      4         B1         301
    3.16e-09     1      4         B2         276
    3.16e-09     1      4         B3         328
       1e-08     1      5         B1         329
       1e-08     1      5         B2         487
       1e-08     1      5         B3         412
    3.16e-08     1      6         B1         372

Set up and run the fit problem

prob = fitproblem;

prob.Data = tableData;

prob.Model = model;

prob.Estimated = estimatedInfo( ...

["log(Kd)", "log(Bmax)"], ...

Bounds=[1e-20 0.1; 0.1 1e3], ...

CategoryVariableName=["<None>", "<None>"]);

prob.ResponseMap = "B = B";

prob.Variants = createVariants(prob.Data, "D", Model=model);

prob.ProgressPlot = true;

prob.FunctionName = "scattersearch";

results = fit(prob);

plot(results);

results.ParameterEstimates

ans = 16×7 table

Name Estimate StandardError Bounds Group CategoryVariableName CategoryValue ________ __________ _____________ ______________ _____ ____________________ _____________ {'Kd' } 2.2211e-09 3.9834e-10 1e-20 0.1 1 {'<None>'} <None> {'Bmax'} 492.58 16.912 0.1 1000 1 {'<None>'} <None> {'Kd' } 2.2211e-09 3.9834e-10 1e-20 0.1 2 {'<None>'} <None> {'Bmax'} 492.58 16.912 0.1 1000 2 {'<None>'} <None> {'Kd' } 2.2211e-09 3.9834e-10 1e-20 0.1 3 {'<None>'} <None> {'Bmax'} 492.58 16.912 0.1 1000 3 {'<None>'} <None> {'Kd' } 2.2211e-09 3.9834e-10 1e-20 0.1 4 {'<None>'} <None> {'Bmax'} 492.58 16.912 0.1 1000 4 {'<None>'} <None> {'Kd' } 2.2211e-09 3.9834e-10 1e-20 0.1 5 {'<None>'} <None> {'Bmax'} 492.58 16.912 0.1 1000 5 {'<None>'} <None> {'Kd' } 2.2211e-09 3.9834e-10 1e-20 0.1 6 {'<None>'} <None> {'Bmax'} 492.58 16.912 0.1 1000 6 {'<None>'} <None> {'Kd' } 2.2211e-09 3.9834e-10 1e-20 0.1 7 {'<None>'} <None> {'Bmax'} 492.58 16.912 0.1 1000 7 {'<None>'} <None> {'Kd' } 2.2211e-09 3.9834e-10 1e-20 0.1 8 {'<None>'} <None> {'Bmax'} 492.58 16.912 0.1 1000 8 {'<None>'} <None>

Calculate confidence intervals

ciParam = sbioparameterci(results);

plot(ciParam);

ciPred = sbiopredictionci(results);

plot(ciPred);

1 Comment
Show -1 older comments Hide -1 older comments

Jorge on 17 Jun 2025

Fantastic!

Thank you very much! I see now how to proceed.

I will have to learn a bit more about Simbiology plotting otions to visualize the data, but this is what I needed.

Thank you!

Sign in to comment.

Nonlinear regression of multiple datasets with different functions but wit some shared parameters between datasets

17 Comments
Show 15 older comments Hide 15 older comments

Accepted Answer

3 Comments
Show 1 older comment Hide 1 older comment

More Answers (1)

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Products

Release

Tags

Community Treasure Hunt

Nonlinear regression of multiple datasets with different functions but wit some shared parameters between datasets

17 Comments Show 15 older comments Hide 15 older comments

Accepted Answer

3 Comments Show 1 older comment Hide 1 older comment

More Answers (1)

1 Comment Show -1 older comments Hide -1 older comments

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

17 Comments
Show 15 older comments Hide 15 older comments

3 Comments
Show 1 older comment Hide 1 older comment

1 Comment
Show -1 older comments Hide -1 older comments