Nonlinear Logistic Regression

Open Live Script

This example shows two ways of fitting a nonlinear logistic regression model. The first method uses maximum likelihood (ML) and the second method uses generalized least squares (GLS) via the function fitnlm from Statistics and Machine Learning Toolbox™.

Problem Description

Logistic regression is a special type of regression in which the goal is to model the probability of something as a function of other variables. Consider a set of predictor vectors $x_{1}, \dots, x_{N}$ where $N$ is the number of observations and $x_{i}$ is a column vector containing the values of the $d$ predictors for the $i$ th observation. The response variable for $x_{i}$ is $Z_{i}$ where $Z_{i}$ represents a Binomial random variable with parameters $n$ , the number of trials, and $μ_{i}$ , the probability of success for trial $i$ . The normalized response variable is $Y_{i} = Z_{i} / n$ - the proportion of successes in $n$ trials for observation $i$ . Assume that responses $Y_{i}$ are independent for $i = 1, \dots, N$ . For each $i$ :

$E (Y_{i}) = μ_{i}$

$V a r (Y_{i}) = \frac{μ_{i} (1 - μ_{i})}{n}$ .

Consider modeling $μ_{i}$ as a function of predictor variables $x_{i}$ .

In linear logistic regression, you can use the function fitglm to model $μ_{i}$ as a function of $x_{i}$ as follows:

$\log (\frac{μ_{i}}{1 - μ_{i}}) = x_{i}^{T} β$

with $β$ representing a set of coefficients multiplying the predictors in $x_{i}$ . However, suppose you need a nonlinear function on the right-hand-side:

$\log (\frac{μ_{i}}{1 - μ_{i}}) = f (x_{i}, β) .$

There are functions in Statistics and Machine Learning Toolbox™ for fitting nonlinear regression models, but not for fitting nonlinear logistic regression models. This example shows how you can use toolbox functions to fit those models.

Direct Maximum Likelihood (ML)

The ML approach maximizes the log likelihood of the observed data. The likelihood is easily computed using the Binomial probability (or density) function as computed by the binopdf function.

Generalized Least Squares (GLS)

You can estimate a nonlinear logistic regression model using the function fitnlm. This might seem surprising at first since fitnlm does not accommodate Binomial distribution or any link functions. However, fitnlm can use Generalized Least Squares (GLS) for model estimation if you specify the mean and variance of the response. If GLS converges, then it solves the same set of nonlinear equations for estimating $β$ as solved by ML. You can also use GLS for quasi-likelihood estimation of generalized linear models. In other words, we should get the same or equivalent solutions from GLS and ML. To implement GLS estimation, provide the nonlinear function to fit, and the variance function for the Binomial distribution.

Mean or model function

The model function describes how $μ_{i}$ changes with $β$ . For fitnlm, the model function is:

$μ_{i} = \frac{1}{1 + exp {- f (x_{i}, β)}}$

Weight function

fitnlm accepts observation weights as a function handle using the 'Weights' name-value pair argument. When using this option, fitnlm assumes the following model:

$E (Y_{i}) = μ_{i}$

$V a r (Y_{i}) = \frac{σ^{2}}{w (μ_{i})}$

where responses $Y_{i}$ are assumed to be independent, and $w$ is a custom function handle that accepts $μ_{i}$ and returns an observation weight. In other words, the weights are inversely proportional to the response variance. For the Binomial distribution used in the logistic regression model, create the weight function as follows:

$w (μ_{i}) = \frac{1}{V a r (y_{i})} = \frac{n}{μ_{i} (1 - μ_{i})}$

fitnlm models the variance of the response $Y_{i}$ as $σ^{2} / w (μ_{i})$ where $σ^{2}$ is an extra parameter that is present in GLS estimation, but absent in the logistic regression model. However, this typically does not affect the estimation of $β$ , and it provides a "dispersion parameter" to check on the assumption that the $Z_{i}$ values have a Binomial distribution.

An advantage of using fitnlm over direct ML is that you can perform hypothesis tests and compute confidence intervals on the model coefficients.

Generate Example Data

To illustrate the differences between ML and GLS fitting, generate some example data. Assume that $x_{i}$ is one dimensional and suppose the true function $f$ in the nonlinear logistic regression model is the Michaelis-Menten model parameterized by a $2 \times 1$ vector $β$ :

$f (x_{i}, β) = \frac{β_{1} x_{i}}{β_{2} + x_{i}} .$

myf = @(beta,x) beta(1)*x./(beta(2) + x);

Create a model function that specifies the relationship between $μ_{i}$ and $β$ .

mymodelfun = @(beta,x) 1./(1 + exp(-myf(beta,x)));

Create a vector of one dimensional predictors and the true coefficient vector $β$ .

rng(300,'twister');
x    = linspace(-1,1,200)';
beta = [10;2];

Compute a vector of $μ_{i}$ values for each predictor.

mu = mymodelfun(beta,x);

Generate responses $z_{i}$ from a Binomial distribution with success probabilities $μ_{i}$ and number of trials $n$ .

n = 50;
z = binornd(n,mu);

Normalize the responses.

y = z./n;

ML Approach

The ML approach defines the negative log likelihood as a function of the $β$ vector, and then minimizes it with an optimization function such as fminsearch. Specify beta0 as the starting value for $β$ .

mynegloglik = @(beta) -sum(log(binopdf(z,n,mymodelfun(beta,x))));
beta0 = [3;3];
opts = optimset('fminsearch');
opts.MaxFunEvals = Inf;
opts.MaxIter = 10000;
betaHatML = fminsearch(mynegloglik,beta0,opts)

betaHatML = 2×1

    9.9259
    1.9720

The estimated coefficients in betaHatML are close to the true values of [10;2].

GLS Approach

The GLS approach creates a weight function for fitnlm previously described.

wfun = @(xx) n./(xx.*(1-xx));

Call fitnlm with custom mean and weight functions. Specify beta0 as the starting value for $β$ .

nlm = fitnlm(x,y,mymodelfun,beta0,'Weights',wfun)

nlm = 
Nonlinear regression model:
    y ~ F(beta,x)

Estimated Coefficients:
          Estimate      SE       tStat       pValue  
          ________    _______    ______    __________

    b1     9.926      0.83135     11.94     4.193e-25
    b2     1.972      0.16438    11.996    2.8182e-25


Number of observations: 200, Error degrees of freedom: 198
Root Mean Squared Error: 1.16
R-Squared: 0.995,  Adjusted R-Squared 0.995
F-statistic vs. zero model: 1.88e+04, p-value = 2.04e-226

Get an estimate of $β$ from the fitted NonLinearModel object nlm.

betaHatGLS = nlm.Coefficients.Estimate

betaHatGLS = 2×1

    9.9260
    1.9720

As in the ML method, the estimated coefficients in betaHatGLS are close to the true values of [10;2]. The small p-values for both $β_{1}$ and $β_{2}$ indicate that both coefficients are significantly different from $0$ .

Compare ML and GLS Approaches

Compare estimates of $β$ .

max(abs(betaHatML - betaHatGLS))

ans = 1.1460e-05

Compare fitted values using ML and GLS

yHatML  = mymodelfun(betaHatML ,x);
yHatGLS = mymodelfun(betaHatGLS,x);
max(abs(yHatML - yHatGLS))

ans = 1.2746e-07

ML and GLS approaches produce similar solutions.

Plot fitted values using ML and GLS

figure
plot(x,y,'g','LineWidth',1)
hold on
plot(x,yHatML ,'b'  ,'LineWidth',1)
plot(x,yHatGLS,'m--','LineWidth',1)
legend('Data','ML','GLS','Location','Best')
xlabel('x')
ylabel('y and fitted values')
title('Data y along with ML and GLS fits.')

ML and GLS produce similar fitted values.

Plot estimated nonlinear function using ML and GLS

Plot true model for $f (x_{i}, β)$ . Add plot for the initial estimate of $f (x_{i}, β)$ using $β = β_{0}$ and plots for ML and GLS based estimates of $f (x_{i}, β)$ .

figure
plot(x,myf(beta,x),'r','LineWidth',1)
hold on
plot(x,myf(beta0,x),'k','LineWidth',1)
plot(x,myf(betaHatML,x),'c--','LineWidth',1)
plot(x,myf(betaHatGLS,x),'b-.','LineWidth',1)
legend('True f','Initial f','Estimated f with ML', ...
    'Estimated f with GLS','Location','Best')
xlabel('x')
ylabel('True and estimated f')
title('Comparison of true f with estimated f using ML and GLS.')

The estimated nonlinear function $f$ using both ML and GLS methods is close to the true nonlinear function $f$ . You can use a similar technique to fit other nonlinear generalized linear models like nonlinear Poisson regression.