Main Content

fitlmcens

Fit censored linear regression model

Since R2025a

    Description

    mdl = fitlmcens(tbl,ResponseVarName,Censoring=cens) returns a censored linear regression model fit to the input data in tbl, using the response variable specified by ResponseVarName and the censoring information in cens. If the response variable is in the last column of tbl, you do not need to specify ResponseVarName.

    example

    mdl = fitlmcens(tbl,ResponseVarName,modelspec,Censoring=cens) additionally specifies the linear regression model to use for fitting.

    example

    mdl = fitlmcens(tbl,y,Censoring=cens) uses the variables in tbl for the predictors and the vector y for the response.

    mdl = fitlmcens(tbl,y,modelspec,Censoring=cens) additionally specifies the linear regression model to use for fitting.

    mdl = fitlmcens(X,y,Censoring=cens) returns a linear regression model of the responses y, fit to the data matrix X.

    mdl = fitlmcens(X,y,modelspec,Censoring=cens) additionally specifies the linear regression model to use for fitting.

    mdl = fitlmcens(tbl,yint) uses the variables in tbl for the predictors and the censoring information in the two-column matrix yint for the response.

    mdl = fitlmcens(X,yint) uses the columns in X for the predictors.

    example

    mdl = fitlmcens(___,modelspec) additionally specifies the linear regression model to use for fitting, using either of the two preceding syntaxes.

    mdl = fitlmcens(___,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in the previous syntaxes. For example, you can specify categorical variables, observations to exclude, and use observation weights.

    example

    Examples

    collapse all

    Load the readmissiontimes sample data.

    load readmissiontimes

    The variables Age, Weight, and ReadmissionTime contain data for patient, age, weight, and time of readmission. The Censored variable contains censoring information for ReadmissionTime.

    Save the variables in a table, and fit a censored linear regression model to the data using ReadmissionTime as the response and Censored as the censoring information.

    tbl = table(Age,Weight,ReadmissionTime,Censored);
    mdl = fitlmcens(tbl,"ReadmissionTime",Censoring="Censored")
    mdl = 
    Censored linear regression model
        ReadmissionTime ~ 1 + Age + Weight
    
    Estimated Coefficients:
                       Estimate        SE        tStat        pValue  
                       _________    ________    ________    __________
    
        (Intercept)        28.62      3.5313      8.1047    1.7047e-12
        Age            -0.060686    0.061984    -0.97905       0.33001
        Weight          -0.11977    0.017199     -6.9638    4.1162e-10
    
    Sigma: 4.245
    
    Number of observations: 100, Error degrees of freedom: 96
    25 right-censored observations
    75 uncensored observations
    Likelihood ratio statistic vs. constant model: 39, p-value = 3.47e-09
    

    mdl is a CensoredLinearModel object that contains the results of fitting the model to the data. The small p-value for the Weight term indicates that it has a statistically significant effect on patient readmission time.

    Load the readmissiontimes sample data.

    load readmissiontimes

    The variables Age, Weight, and ReadmissionTime contain data for patient age, weight, and time of readmission. The Censored variable contains censoring information for ReadmissionTime.

    Save Age, Weight, and ReadmissionTime in a table.

    tbl = table(Age,Weight,ReadmissionTime);

    Fit a censored linear regression model using Age, Weight, and Smoker as the predictor variables, ReadmissionTime as the response, and Censored as the censoring information. Because ReadmissionTime is the last column in tbl, you do not need to specify the ResponseVarName argument.

    mdl1 = fitlmcens(tbl,Censoring=Censored)
    mdl1 = 
    Censored linear regression model
        ReadmissionTime ~ 1 + Age + Weight
    
    Estimated Coefficients:
                       Estimate        SE        tStat        pValue  
                       _________    ________    ________    __________
    
        (Intercept)        28.62      3.5313      8.1047    1.7047e-12
        Age            -0.060686    0.061984    -0.97905       0.33001
        Weight          -0.11977    0.017199     -6.9638    4.1162e-10
    
    Sigma: 4.245
    
    Number of observations: 100, Error degrees of freedom: 96
    25 right-censored observations
    75 uncensored observations
    Likelihood ratio statistic vs. constant model: 39, p-value = 3.47e-09
    

    mdl1 is a CensoredLinearModel object that includes the results of fitting a censored linear regression model to the data. The output display includes information about the model, statistics for each model term, and the censored observations. The p-values for the Weight and Age terms indicate that Weight has a statistically significant effect on patient readmission time and Age does not.

    Fit another model to the data, using only the Weight term.

    mdl2 = fitlmcens(tbl,"ReadmissionTime~Weight",Censoring=Censored)
    mdl2 = 
    Censored linear regression model
        ReadmissionTime ~ 1 + Weight
    
    Estimated Coefficients:
                       Estimate      SE        tStat       pValue  
                       ________    _______    _______    __________
    
        (Intercept)      26.398     2.7107     9.7387    4.9168e-16
        Weight         -0.12041    0.01729    -6.9642    3.9554e-10
    
    Sigma: 4.273
    
    Number of observations: 100, Error degrees of freedom: 97
    25 right-censored observations
    75 uncensored observations
    Likelihood ratio statistic vs. constant model: 38, p-value = 7.06e-10
    

    The result for Likelihood ratio statistic vs. constant model shows that mdl2 is a slightly better fit than mdl1.

    Load the censoreddata sample data.

    load censoreddata

    The matrix X contains data for three predictors, and the matrix yint contains censoring information for a response variable. Display yint.

    yint
    yint = 10×2
    
          -Inf   13.9492
          -Inf   -0.1978
          -Inf    6.9939
       64.7670       Inf
        4.2314       Inf
       -1.1874       Inf
        0.2764    2.2764
       36.1247   38.1247
        2.5400    4.5400
       30.4107   32.4107
    
    

    The first three rows of yint specify left-censored observations. The fourth to sixth rows specify right-censored observations. The remaining rows specify interval-censored observations.

    Fit a linear regression model to the censored data in X and yint.

    mdl = fitlmcens(X,yint)
    mdl = 
    Censored linear regression model
        y ~ 1 + x1 + x2 + x3
    
    Estimated Coefficients:
                       Estimate      SE       tStat      pValue 
                       ________    ______    ________    _______
    
        (Intercept)     17.317     10.189      1.6996    0.14995
        x1               9.401     8.7053      1.0799     0.3295
        x2             -3.2891     13.057    -0.25191    0.81114
        x3             -10.134      7.947     -1.2751     0.2583
    
    Sigma: 25.7
    
    Number of observations: 10, Error degrees of freedom: 5
    4 interval-censored observations
    3 right-censored observations
    3 left-censored observations
    Likelihood ratio statistic vs. constant model: 2.11, p-value = 0.551
    

    The large p-values indicate that not enough evidence exists to conclude that any model terms have a statistically significant effect on patient readmission time.

    Load the readmissiontimes sample data.

    load readmissiontimes

    The variables Age, Weight, Smoker, and ReadmissionTime contain data for patient age, weight, smoking status, and time of readmission. The Censored variable contains censoring information for ReadmissionTime.

    Save the Age, Weight, ReadmissionTime, and Censored variables in a table, and create a vector of indices for observations corresponding to smokers.

    tbl = table(Age,Weight,ReadmissionTime,Censored);
    idx = Smoker==1;

    Fit a censored linear regression model to the data for nonsmokers using ReadmissionTime as the response and Censored as the censoring information. Specify an interactions model.

    mdl = fitlmcens(tbl,"ReadmissionTime","interactions",Censoring="Censored",ExcludeObservations=idx)
    mdl = 
    Censored linear regression model
        ReadmissionTime ~ 1 + Age*Weight
    
    Estimated Coefficients:
                       Estimate        SE         tStat      pValue  
                       _________    _________    _______    _________
    
        (Intercept)       49.413       16.878     2.9276    0.0047949
        Age             -0.57333      0.44326    -1.2934      0.20073
        Weight          -0.25837       0.1084    -2.3834     0.020282
        Age:Weight     0.0035564    0.0028401     1.2522      0.21527
    
    Sigma: 4.604
    
    Number of observations: 66, Error degrees of freedom: 61
    16 right-censored observations
    50 uncensored observations
    Likelihood ratio statistic vs. constant model: 24.2, p-value = 2.23e-05
    

    The small p-value for Weight indicates that patient weight has a statistically significant effect on readmission time.

    Input Arguments

    collapse all

    Input data, specified as a table. tbl includes data for the predictor variables, and can contain also data for the response variable and the censoring information. The predictor variables can be numeric, logical, categorical, character, or string. The response variable must be numeric or logical. When tbl contains censoring information, it must be in the integer vector format described in cens.

    When you specify tbl without specifying ResponseVarName or y, fitlmcens uses the variable in the last column of the table as the response variable and the rest as the predictor variables.

    • To use a different column as the response variable, set the ResponseVar name-value argument.

    • To use a subset of the columns as predictors, set the PredictorVars name-value argument.

    • To define a model specification, set the modelspec argument using a formula or terms matrix. The formula or terms matrix specifies which columns to use as the predictor or response variables.

    The variable names in the table do not have to be valid MATLAB® identifiers, but the names must not contain leading or trailing blanks. If the names are not valid, you cannot use a formula when you fit or adjust a model.

    You can verify the variable names in tbl by using the isvarname function. If the variable names are not valid, then you can convert them by using the matlab.lang.makeValidName function.

    Data Types: table

    Name of the variable to use as the response, specified as a string scalar or character vector. ResponseVarName indicates which variable in tbl contains the response data. When you specify ResponseVarName, you must also specify the tbl input argument.

    Data Types: char | string

    Censoring information for the observations, specified as an integer vector, an interval, or a variable name. You cannot use cens to specify interval censoring. To specify interval censoring, see y.

    When you specify cens as an integer vector, it must have the same number of elements as the number of observations in the input data. Each element of cens must be -1, 0, or 1 to indicate that the corresponding observation is left-censored, uncensored, or right-censored, respectively.

    When you specify cens as an interval, it must be a two-element numeric vector [L R] where L < R. fitlmcens censors observations according to their response values.

    • Response values less than or equal to L are left-censored at L.

    • Response values inside the interval are uncensored.

    • Response values greater than or equal to R are right-censored at R.

    When you specify cens as a variable name, you must also specify tbl. tbl must include a variable of the same name that contains censoring information in the integer vector format described above.

    You cannot specify cens when y is a two-column matrix.

    Example: [-10,10]

    Example: [-1*ones(10,1);zeros(10,1);ones(10,1)]

    Example: "censvar"

    Data Types: single | double | string | char

    Model specification, specified as one of the following values.

    • A character vector or string scalar containing the model name.

      ValueModel Description
      "constant"Model contains only a constant (intercept) term
      "linear"Model contains an intercept and linear term for each predictor
      "interactions"Model contains an intercept, linear term for each predictor, and all products of pairs of distinct predictors (no squared terms)
      "purequadratic"Model contains an intercept term and linear and squared terms for each predictor
      "quadratic"Model contains an intercept term, linear and squared terms for each predictor, and all products of pairs of distinct predictors
      "polyijk"Model is a polynomial with all terms up to degree i in the first predictor, degree j in the second predictor, and so on. Specify the maximum degree for each predictor by using numerals 0 through 9. The model contains interaction terms, but the degree of each interaction term does not exceed the maximum value of the specified degrees. For example, "poly13" has an intercept and x1, x2, x22, x23, x1*x2, and x1*x22 terms, where x1 and x2 are the first and second predictors, respectively.
    • A t-by-(p + 1) terms matrix that specifies the terms in the model, where t is the number of terms, p is the number of predictor variables, and +1 accounts for the response variable. A terms matrix is convenient when the number of predictors is large and you want to generate the terms programmatically. For more information, see Terms Matrix.

    • A character vector or string scalar formula in the form

      "y ~ terms",

      where the terms are in Wilkinson Notation. The variable names in the formula must be variable names in tbl or variable names specified by VarNames. Also, the variable names must be valid MATLAB identifiers.

      The software determines the order of terms in a fitted model by using the order of terms in tbl or X. Therefore, the order of terms in the model can be different from the order of terms in the specified formula. For more information, see Formula.

    When you specify modelspec, you cannot use the PredictorVars name-value argument to specify the predictor variables.

    Example: "quadratic"

    Example: "y ~ x1 + x2^2 + x1:x2"

    Data Types: single | double | char | string

    Response variable, specified as an n-by-1 numeric vector, where n is the number of observations. Each element in y corresponds to the row with the same index in tbl or X.

    Data Types: single | double

    Predictor variables, specified as an n-by-p matrix, where n is the number of observations and p is the number of predictor variables. Each column of X represents one variable, and each row represents one observation.

    By default, the model includes a constant term unless you explicitly remove it, so do not include a column of 1s in X.

    Data Types: single | double

    Response variable, specified as an n-by-2 numeric matrix, where n is the number of observations. Each row in yint corresponds to the same row in tbl or X.

    Each row of yint contains the lower and upper bounds for the interval-censored observation.

    • To specify a left-censored observation, set the lower bound to -Inf.

    • To specify a right-censored observation, set the upper bound to Inf.

    • To specify an uncensored observation, set the upper and lower bounds to the response value of the observation.

    Data Types: single | double

    Name-Value Arguments

    collapse all

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: fitlmcens(X,y,Censoring=cens,ExcludeObservations=1:5,Intercept=false) fits a linear regression model without an intercept to the censored data in X and y, excluding the first five observations.

    Categorical predictor list, specified as a string array or cell array of character vectors containing categorical predictor names in the table tbl, or a logical or numeric index vector indicating which predictor columns are categorical.

    • If the predictor data is in tbl, then, by default, fitlmcens treats all categorical values, logical values, character arrays, string arrays, and cell arrays of character vectors as categorical predictors.

    • If the predictor data is in a matrix X, then the default value of CategoricalVars is an empty matrix []. That is, no predictor is categorical unless you specify it as categorical.

    For example, you can specify the second and third variables out of six as categorical using either of the following examples.

    Example: CategoricalVars=[2 3]

    Example: CategoricalVars=logical([0 1 1 0 0 0])

    Data Types: single | double | logical | string | cell

    Observations to exclude from the fit, specified as a logical or numeric index vector indicating which observations to exclude.

    For example, you can exclude the second and third observations of six using either of the following examples.

    Example: Exclude=[2 3]

    Example: Exclude=logical([0 1 1 0 0 0])

    Data Types: single | double | logical

    Indicator for the constant term (intercept) in the fit, specified as a logical 1 (true) to include the term in the model, or 0 (false) to remove the term from the model. By default, the model includes a constant term unless you explicitly remove it.

    Use Intercept only when specifying the model using a character vector or string scalar, not a formula or matrix.

    Example: Intercept=false

    Data Types: logical

    Predictor variables to use in the fit, specified as a string array or cell array of character vectors of the variable names in the table tbl, or a logical or numeric index vector indicating which columns are predictor variables.

    The string values or character vectors must be names in tbl or names you specify using the VarNames name-value argument.

    The default value is all variables in X, or all variables in tbl except ResponseVar.

    When you specify PredictorVars, you cannot use the modelspec input argument to specify a terms matrix.

    For example, you can specify the second and third variables as the predictor variables using either of the following examples.

    Example: PredictorVars=[2 3]

    Example: PredictorVars=logical([0 1 1 0 0 0])

    Data Types: single | double | logical | string | cell

    Names of variables, specified as a string array or cell array of character vectors that includes the names for the columns of X first, and the name for the response variable y last.

    The variable names do not have to be valid MATLAB identifiers, but the names must not contain leading or trailing blanks. If the names are not valid, you cannot use a formula when you fit or adjust a model.

    You can verify the variable names by using the isvarname function. If the variable names are not valid, then you can convert them by using the matlab.lang.makeValidName function.

    You cannot specify VarNames when you specify input data using the tbl input argument.

    Example: VarNames=["Horsepower","Acceleration","Model_Year","MPG"]

    Data Types: string | cell

    Observation weights, specified as an n-by-1 vector of nonnegative scalar values, where n is the number of observations.

    Data Types: single | double

    Output Arguments

    collapse all

    Censored linear model, returned as a CensoredLinearModel object.

    More About

    collapse all

    Algorithms

    fitlmcens fits a censored linear regression model to the data using a modified version of maximum likelihood estimation for linear regression. Maximum likelihood estimation maximizes

    L=i=1nP(yi|β0,β,σ),

    where yi is the response for the ith observation, β0 is the intercept term in the model formula, β is the vector of model coefficients, and σ is the standard deviation of the error term. For censored observations yi*, fitlmcens replaces P(yi|β0,β,σ) in the equation above.

    • If yi* is left-censored, fitlmcens uses

      P(yi<yi*|β0,β,σ)

    • If yi* is right-censored, fitlmcens uses

      P(yi>yi*|β0,β,σ)

    • If yi* is interval-censored, fitlmcens uses

      P(Li<yi<Ri|β0,β,σ),

      where Li and Ri are the left and right bounds for the interval corresponding to yi.

    Version History

    Introduced in R2025a