To fit a linear-mixed effects model, you must store your data in a table or dataset array. In
your table or dataset array, you must have a column for each variable including the
response variable. More specifically, the table or dataset array, say
`tbl`

, must contain the following:

A response variable

`y`

Predictive variables

`X`

which can be continuous or grouping variables_{j}Grouping variables

`g`

,_{1}`g`

, ...,_{2}`g`

,_{R}

where the grouping variables in
`X`

and
_{j}`g`

can be
categorical, logical, a character array, a string array, or a cell array of
character vectors, _{r}*r* = 1, 2, ..., *R*.

You must organize your data so that each row represents an observation. And each row should contain the value of variables and the levels of grouping variables corresponding to that observation. For example, if you have data from an experiment with four treatment options, on five different types of individuals chosen randomly from a population of individuals (blocks), the table or dataset array must look like this.

Block | Treatment | Response |
---|---|---|

1 | 1 | y11 |

1 | 2 | y12 |

1 | 3 | y13 |

1 | 4 | y14 |

... | ... | ... |

5 | 1 | y51 |

5 | 2 | y52 |

5 | 3 | y53 |

5 | 4 | y54 |

Now, consider a split-plot experiment, where the effect of four different types of fertilizers on the yield of tomato plants is studied. The soil where the tomato plants are planted is divided into three blocks based on the soil type: sandy, silty, and loamy. Each block is divided into five plots, where five types of tomato plants, (cherry, heirloom, grape, vine, and plum) are randomly assigned to these plots. Then, the tomato plants in the plots are divided into subplots, where each subplot is treated by one of the four fertilizers. The data from this experiment looks like:

Soil | Tomato | Fertilizer | Yield |
---|---|---|---|

'Sandy' | 'Plum' | 1 | 104 |

'Sandy' | 'Plum' | 2 | 136 |

'Sandy' | 'Plum' | 3 | 158 |

'Sandy' | 'Plum' | 4 | 174 |

'Sandy' | 'Cherry' | 1 | 57 |

'Sandy' | 'Cherry' | 2 | 86 |

... | ... | ... | ... |

'Sandy' | 'Vine' | 3 | 99 |

'Sandy' | 'Vine' | 4 | 117 |

'Silty' | 'Plum' | 1 | 120 |

'Silty' | 'Plum' | 2 | 115 |

... | ... | ... | ... |

'Loamy' | 'Vine' | 3 | 111 |

'Loamy' | 'Vine' | 4 | 105 |

You must specify the model you want to fit using the `formula`

input
argument to `fitlme`

.

In general, a formula for model specification is a character vector or string scalar of the
form `'y ~ terms'`

. For linear mixed-effects models, this formula
is in the form ```
'y ~ fixed + (random1|grouping1) + ... +
(randomR|groupingR)'
```

, where `fixed`

contains the
fixed-effects terms and `random1, ..., randomR`

contain the
random-effects terms. For example, for the previous fertilizer experiment, consider
the following mixed-effects model

$${y}_{imjk}={\beta}_{0}+{\displaystyle \sum _{m=2}^{4}{\beta}_{1m}I{\left[F\right]}_{im}}+{\displaystyle \sum _{j=2}^{5}{\beta}_{2j}I{\left[T\right]}_{ij}}+{b}_{0k}{S}_{k}+{b}_{0jk}{(S*T)}_{jk}+{\epsilon}_{imjk},$$

where *i* = 1, 2, ..., 60, the index *m* corresponds
to the fertilizer types, *j* corresponds to the tomato
types, and *k* = 1, 2, 3 corresponds to the blocks
(soil). *S*_{k} represents
the *k*th soil type, and *I*[*F*]_{im} is
the dummy variable representing level *m* of the
fertilizer. Similarly, *I*[*T*]_{ij} is
the dummy variable representing the level *j* of
the tomato type.

You can fit this model using the formula ```
'Yield ~ 1
+ Fertilizer + Tomato + (1|Soil)+(1|Soil:Tomato)'
```

.

For detailed information on how to specify your model using formula, see Relationship Between Formula and Design Matrices.

If you cannot easily describe your model using a formula, you
can create design matrices to define the fixed and random effects,
and fit the model using `fitlmematrix(X,y,Z,G)`

.
You must create your design matrices as follows.

Fixed-effects and random-effects design matrices `X`

and `Z`

:

Enter a column of 1s for the intercept using

`ones(n,1)`

, where*n*is the total number of observations.If

`X1`

is a continuous variable, then enter`X1`

as it is in a separate column.If

`X1`

is a categorical variable with*m*levels, then there must be*m*– 1 dummy variables for*m*– 1 levels of`X1`

in`X`

.For example, consider an experiment where you want to study the impact of quality of raw materials from four different providers on the productivity of a production line. If you fit a linear mixed-effects model with intercept and provider as the fixed-effects terms, intercept is the random-effects term, and you use reference contrasts coding, then you must construct your fixed- and random-effects design matrices as follows.

`D = dummyvar(provider); % Create dummy variables X = [ones(n,1) D(:,2) D(:,3) D(:,4)]; Z = [ones(n,1)];`

Because reference contrast coding uses the first provider as the reference, and the model has an intercept, you must use the dummy variables for only the last three providers.

If there is an interaction term of predictor variables

`X1`

and`X2`

, then you must enter a column that you form by elementwise product of the vectors`X1`

and`X2`

.For example, if you want to fit a model, where there is an intercept, a continuous treatment factor, a continuous time factor, and their interaction as the fixed-effects in a longitudinal study, and time is the random-effects term, then your fixed- and random-effects design matrices should look like

X = [ones(n,1),treatment,time,treatment.*time]; y = response; Z = [time];

Grouping variables `G`

:

There is one column for each grouping variable and a column of elementwise product of the grouping variables in case of a nesting.

For example, if you want to group plots (`plot`

)
within blocks (`block`

), then you must add a column
of elementwise product of `plot`

by `block`

.
More specifically, if you want to fit a model where there is intercept
and a continuous treatment factor as the fixed-effects in a split-block
experiment, and the intercept and treatment are grouped by the plots
nested within blocks, then the design matrices should look like this.

X = [ones(n,1),treatment]; y = response; Z = [ones(n,1),treatment]; G = [block.*plot];

Suppose in the earlier quality of raw materials example, the raw materials arrive in bulks, and the bulks are nested within providers. If you want to fit a linear mixed-effects model, where intercept is grouped by the bulks within providers, then your design matrices should look like this.

D = dummyvar(provider); X = [ones(n,1) D(:,2) D(:,3) D(:,4)]; y = response; Z = ones(n,1); G = [provider.*bulks];

In the earlier longitudinal study example, if you want to add random effects for intercept and time grouped by subjects that participated in the study, then your design matrices should look like

X = [ones(n,1),treatment,time, treatment.*time]; y = response; Z = [ones(n,1),time]; G = subject;

`fitlme(tbl,formula)`

and `fitlmematrix(X,y,Z,G)`

are
equivalent in functionality, such that

`y`

is the*n*-by-1 response vector.`X`

is an*n*-by-*p*fixed-effects design matrix.`fitlme`

constructs this from the expression`fixed`

in`formula`

.`Z`

is an*R*-by-1 cell array with`Z{r}`

being an*n*-by-*q*(*r*) random-effects design matrix constructed from the*r*th expression in`random`

in`formula`

,*r*= 1, 2, ...,*R*.`G`

is an*R*-by-1 cell array with`G{r}`

being an*n*-by-1 grouping variable,`g`

_{r}, in`formula`

with*M*(*r*) levels or groups.

For example, if `tbl`

is a table or dataset
array containing the response variable `y`

, the continuous
variables `X1`

and `X2`

, and the
grouping variable `g`

, then to fit a linear mixed-effects
model that corresponds to the formula expression ```
'y ~ X1+
X2+ (X1*X2|g)'
```

using `fitlmematrix(X,y,Z,G)`

the
input arguments must correspond to the following:

y = tbl.y X = [ones(n,1), tbl.X1, tbl.X2] Z = [ones(n,1), tbl.X1, tbl.X2, tbl.X1.*tbl.X2] G = tbl.g

`fitlme`

| `fitlmematrix`

| `LinearMixedModel`