3  Mixed Effects Models

Suppose you run an experiment measuring the effect of two types of secondary tasks on the time it takes a person to complete a primary task. This is normally a good candidate for a linear regression, but there is a twist: the participants each generate many data points, and they naturally vary in their multitasking skills. This twist violates the independent observations assumption. What to do? The solution is linear mixed effects models (LMMs).1

We’ll work with a datasets included in the supplemental materials of An Introduction to Linear Mixed-Effects Modeling (Brown 2021).2

rt_dat <-
  readr::read_csv("./input/mixed-effects-rt.csv", col_types = "cdcc") |>
  mutate(modality = factor(modality, levels = c("Audio-only", "Audiovisual")))

acc_dat <-
  readr::read_csv("./input/mixed-effects-acc.csv", col_types = "cdcc") |>
  mutate(
    modality = factor(modality, levels = c("Audio-only", "Audiovisual")),
    acc = factor(acc)
  )

glimpse(rt_dat)
Rows: 21,679
Columns: 4
$ PID      <chr> "301", "301", "301", "301", "301", "301", "301", "301", "301"…
$ RT       <dbl> 1024, 838, 1060, 882, 971, 1064, 1283, 982, 1144, 743, 639, 8…
$ modality <fct> Audio-only, Audio-only, Audio-only, Audio-only, Audio-only, A…
$ stim     <chr> "gown", "might", "fern", "vane", "pup", "rise", "hill", "caug…
glimpse(acc_dat)
Rows: 28,807
Columns: 4
$ PID      <chr> "309", "311", "313", "314", "318", "325", "331", "335", "337"…
$ acc      <fct> 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1…
$ modality <fct> Audio-only, Audio-only, Audio-only, Audio-only, Audio-only, A…
$ stim     <chr> "gong", "gong", "gong", "gong", "gong", "gong", "gong", "gong…

The datasets are from a study about multi-tasking. In rt_dat, the researchers measured participant (PID) response time (RT) on a primary task while completing a secondary task of repeating words presented in one of two modalities (modality): audio-only and audiovisual. The researchers hypothesized that the higher cognitive load associated with audiovisual stimulation would increase RT. Each participant repeated the experiment with various words (stimulus, stim). In acc_dat, the dependent variable is response accuracy (acc \(\in [0,1]\)) rather than response time.

There were 53 participants and 543 words. The summary table suggests audiovisual modality does indeed retard response times over audio-only.

Show the code
gtsummary::tbl_summary(
  rt_dat,
  by = modality,
  include = -c(PID, stim),
  statistic = list(gtsummary::all_continuous() ~ "{mean}, {sd}")
) |>
  gtsummary::as_gt() |>
  gt::tab_header(title = "rt_dat")
rt_dat
Characteristic Audio-only
N = 10,6471
Audiovisual
N = 11,0321
RT 1,041, 302 1,125, 317
1 Mean, SD


Show the code
gtsummary::tbl_summary(
  acc_dat,
  by = modality,
  include = -c(PID, stim)
) |>
  gtsummary::as_gt() |>
  gt::tab_header(title = "acc_dat")
acc_dat
Characteristic Audio-only
N = 14,3911
Audiovisual
N = 14,4161
acc

    0 4,880 (34%) 1,992 (14%)
    1 9,511 (66%) 12,424 (86%)
1 n (%)


PID and stim are important covariates because people differ in their ability to multitask, and words vary in difficulty.

Show the code
bind_rows(
  Participants = summarize(rt_dat, .by = c(modality, PID), M = mean(RT)),
  Stimulus = summarize(rt_dat, .by = c(modality, stim), M = mean(RT)),
  .id = "Factor"
) |>
  mutate(
    Factor = case_when(
      Factor == "Participants" ~ glue("Participants (n = {n_PID})"),
      Factor == "Stimulus" ~ glue("Word Stimulus (n = {n_stim})")
    ),
    grp = coalesce(PID, stim)
  ) %>%
  ggplot(aes(x = modality)) +
  geom_line(aes(y = M, group = grp), alpha = .3, color = "dodgerblue3") +
  facet_wrap(facets = vars(Factor)) +
  labs(
    title = "Modality Effect on Mean Response Time",
    subtitle = "by Participant and Word Stimulas",
    y = "milliseconds (ms)"
  )

3.1 LMM Model

Mixed-effects models are called “mixed” because they simultaneously model fixed and random effects. Fixed effects are population-level effects that should persist across experiments. Fixed effects are the traditional predictors in a regression model. modality is a fixed effect. Random effects are level effects that account for variability at different levels of the data hierarchy. PID and stim are random effects because they are randomly sampled and you expect variability within the populations.

LMM represents the \(i\)-th observation for the \(j\)-th random-effects group as

\[ y_{ij} = \beta_0 + \beta_1x_{ij} + u_j + \epsilon_{ij} \tag{3.1}\]

Where \(\beta_0\) and \(\beta_1\) are the fixed effect intercept and slope, and \(u_j\) is the random effect for the \(j\)-th group. The \(u_j\) terms are the group deviations from the population mean and are assumed to be normally distributed with mean 0 and variance \(\sigma_u^2\). The residual term is also assumed to be normally distributed with mean 0 and variance \(\sigma_\epsilon^2\)

In our study, \(\beta_0\) and \(\beta_1\) are the fixed effects of modality; \(u_j\) is the \(j\) = 53 PID levels; and the model includes an additional term, \(\eta_k\), for the \(k\) = 543 stim levels.

\[ y_{ij} = \beta_0 + \beta_1x_{ij} + u_j + \eta_k + \epsilon_{ijk} \]

3.2 The lmer Engine

Fit an LMM with the lmer package. The pipes (|) in the formula indicate the value to the left varies by the variable to the right. 1 is the intercept (no interactions). You can model the random effects two ways:

  • (1|PID) + (1|stim): level effects by participant and stimulus.
  • (1 + modality|PID) + (1 + modality|stim): level effects plus interaction effects.

Start with the level effect model.

rt_fit_1 <-
  linear_reg() |>
  set_engine("lmer") |>
  fit(
  RT ~ 1 + modality + (1|PID) + (1|stim),
    data = rt_dat
  )

extract_fit_engine(rt_fit_1) |> summary()
Linear mixed model fit by REML ['lmerMod']
Formula: RT ~ 1 + modality + (1 | PID) + (1 | stim)
   Data: data

REML criterion at convergence: 302861.5

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.3572 -0.6949 -0.0205  0.5814  4.9120 

Random effects:
 Groups   Name        Variance Std.Dev.
 stim     (Intercept)   360.2   18.98  
 PID      (Intercept) 28065.7  167.53  
 Residual             67215.3  259.26  
Number of obs: 21679, groups:  stim, 543; PID, 53

Fixed effects:
                    Estimate Std. Error t value
(Intercept)         1044.449     23.164   45.09
modalityAudiovisual   82.525      3.529   23.38

Correlation of Fixed Effects:
            (Intr)
modltyAdvsl -0.078

Let’s walk through these sections.

Tip

Use broom.mixed::tidy() to pull the coefficient estimates.

(tidy_fit <- broom.mixed::tidy(rt_fit_1))
# A tibble: 5 × 6
  effect   group    term                estimate std.error statistic
  <chr>    <chr>    <chr>                  <dbl>     <dbl>     <dbl>
1 fixed    <NA>     (Intercept)           1044.      23.2       45.1
2 fixed    <NA>     modalityAudiovisual     82.5      3.53      23.4
3 ran_pars stim     sd__(Intercept)         19.0     NA         NA  
4 ran_pars PID      sd__(Intercept)        168.      NA         NA  
5 ran_pars Residual sd__Observation        259.      NA         NA  
  • The Random effects show the variability in RT due to stim and PID. The variability among participants (PID sd = 168 ms) is much greater than the variability among words (stim sd = 19 ms).

  • The Fixed effects show audio-visual modality slows RT by 82.5 ms over audio-only.

  • The Correlation of Fixed Effects shows a weak negative correlation between the intercept and the audio-visual modality effect (modltyAdvsl = -0.078 ms). This means when RT times were high, audiovisual effect was somewhat diminished.

To add interactions between the fixed effect variable(s) and the random effects variable(s), change the random intercept (1) to the fixed effect variable. Then both the intercept and the slope can vary.

rt_fit_2 <-
  linear_reg() |>
  set_engine("lmer") |>
  fit(
    RT ~ 1 + modality + (1 + modality|PID) + (1 + modality|stim),
    data = rt_dat
  )

summary(extract_fit_engine(rt_fit_2))
Linear mixed model fit by REML ['lmerMod']
Formula: RT ~ 1 + modality + (1 + modality | PID) + (1 + modality | stim)
   Data: data

REML criterion at convergence: 302385.7

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.3646 -0.6964 -0.0141  0.5886  5.0003 

Random effects:
 Groups   Name                Variance Std.Dev. Corr 
 stim     (Intercept)           304.0   17.44        
          modalityAudiovisual   216.9   14.73   0.16 
 PID      (Intercept)         28559.0  168.99        
          modalityAudiovisual  7709.0   87.80   -0.17
 Residual                     65258.8  255.46        
Number of obs: 21679, groups:  stim, 543; PID, 53

Fixed effects:
                    Estimate Std. Error t value
(Intercept)          1044.14      23.36  44.700
modalityAudiovisual    83.18      12.57   6.615

Correlation of Fixed Effects:
            (Intr)
modltyAdvsl -0.178

This formulation includes random slopes for modality within PID and stim, meaning you’re accounting for differences in how each participant and stimulus responds to modality. Response times varied around 1,044 ms by sd = 17 ms for stim and sd = 169 ms for PID. The modality effect on response times varied around 83 ms by sd = 15 ms for stim and sd = 88 ms for PID. That last statistic means a participant whose slope is 1 sd below the mean would have a slope near zero - a modality effect of zero.

The added correlations in the Random effects section show how the intercept and slopes for modality vary together within each group. The 0.16 correlation between the stimuli and fixed-effect intercepts indicates that stimuli with longer response times in the audio modality tended to have more positive slopes (stronger positive audiovisual effect). The -0.17 correlation between the participant and fixed-effect intercepts indicates that participants with longer response times in the audio modality tended to have a less positive slope (weaker positive, or possibly even a negative, audiovisual effect).

The audio-visual modality slows RT by 83 ms over audio-only - same as in the first model. The REML criterion fell, so adding the random slopes for modality improved the fit.

3.3 Assumptions

The linear regression model assumptions still apply: the relationship between the predictors and the response is linear and the residuals are independent random variables normally distributed with mean zero and constant variance. Additionally, the the random effects intercepts and slopes are assumed to follow a normal distribution. Use residual plots to assess linearity, homoscedasticity, and independence. Use Q-Q plots to check the normality of residuals and random effects. Variance inflation factors (VIFs) can help detect multicollinearity among predictors.

resid_panel(extract_fit_engine(rt_fit_2), smoother = TRUE, qqbands = TRUE)

3.4 Evaluation

To determine whether the modality fixed effect actually affected response times, compare the model to a nested model without modality using a likelihood-ratio test. Likelihood ratio tests calculating the likelihoods for the models using maximum likelihood estimation then compares those likelihoods. A significant test results indicates the full model fit the data better. Perform a likelihood ratio test anova().

Show the code
# reduced model exlcludes the `modality` predictor.
rt_reduced <- linear_reg() |>
  set_engine("lmer") |>
  fit(
    RT ~ 1 + (1 + modality|PID) + (1 + modality|stim),
    data = rt_dat
  )

lrt <- anova(
  extract_fit_engine(rt_fit_2),
  extract_fit_engine(rt_reduced)
)

tidy(lrt)
# A tibble: 2 × 9
  term            npar    AIC    BIC  logLik minus2logL statistic    df  p.value
  <chr>          <dbl>  <dbl>  <dbl>   <dbl>      <dbl>     <dbl> <dbl>    <dbl>
1 extract_fit_e…     8 3.02e5 3.03e5 -1.51e5    302433.      NA      NA NA      
2 extract_fit_e…     9 3.02e5 3.02e5 -1.51e5    302401.      32.4     1  1.26e-8

, but there is a better way. afex::mixed(method = "LRT") performs the test for all fixed effects variables in the model. In this case study we have only one.

The likelihood-ratio test indicated that the model including modality provided a better fit for the data than a model without it, \(\chi^2\)(1) = 32.4, p < .001.

3.5 Reporting

av_terms <- broom.mixed::tidy(rt_fit_2) |> filter(term == "modalityAudiovisual")

A likelihood-ratio test indicated that the model including modality provided a better fit for the data than a model without it, \(\chi^2\)(1) = 32.4, p < .001. Examination of the summary output for the full model indicated that response times were on average an estimated 83.2 ms slower in the audiovisual relative to the audio-only condition (\(\hat{\beta}\) = 83.2, SE = 12.6, t = 83.2, SE = 6.6).

gtsummary::tbl_regression(rt_fit_2)
Characteristic Beta 95% CI
modality

    Audio-only
    Audiovisual 83 59, 108
Abbreviation: CI = Confidence Interval

3.6 Logistic Regression

This section fits a logistic LMM to the acc_dat dataset.

Show the code
acc_fit <-
  logistic_reg() |>
  set_engine("glmer") |>
  set_mode("classification") |>
  fit(
    acc ~ 1 + modality + (1 + 1|PID) + (1 + 1|stim),
    data = acc_dat,
    family = "binomial"
  )

Fit a reduced model lacking the modality fixed effect to use in a likelihood ratio test.

Show the code
acc_reduced_fit <-
  logistic_reg() |>
  set_engine("glmer") |>
  set_mode("classification") |>
  fit(
    acc ~ 1 + (1 + 1|PID) + (1 + 1|stim),
    data = acc_dat,
    family = "binomial"
  )

lrt_acc <- anova(
  extract_fit_engine(acc_fit),
  extract_fit_engine(acc_reduced_fit)
)

tidy(lrt_acc)
# A tibble: 2 × 9
  term             npar    AIC    BIC  logLik minus2logL statistic    df p.value
  <chr>           <dbl>  <dbl>  <dbl>   <dbl>      <dbl>     <dbl> <dbl>   <dbl>
1 extract_fit_en…     3 29930. 29955. -14962.     29924.       NA     NA      NA
2 extract_fit_en…     4 28122. 28155. -14057.     28114.     1810.     1       0

3.7 Appendix: Comparison to OLS

You might consider just throwing PID and stim into a simple linear regression as covariates.

Show the code
linear_reg() |>
  fit(RT ~ modality + PID + stim, data = rt_dat) |>
  tidy() |>
  filter(
    !str_detect(term, "^PID"),
    !str_detect(term, "^stim")
  )
# A tibble: 2 × 5
  term                estimate std.error statistic   p.value
  <chr>                  <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)            940.      44.0       21.4 2.84e-100
2 modalityAudiovisual     82.7      3.55      23.3 1.23e-118

Or maybe pre-summarize the data by calculating the mean response time across all words by PID and modality, then run the regression controlling for just PID.

Show the code
x <- summarize(rt_dat, .by = c(modality, PID), MRT = mean(RT))

linear_reg() |>
  fit(MRT ~ modality + PID, data = x) |>
  tidy() |>
  filter(!str_detect(term, "^PID"))
# A tibble: 2 × 5
  term                estimate std.error statistic  p.value
  <chr>                  <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)            973.       46.2     21.1  4.07e-27
2 modalityAudiovisual     83.2      12.6      6.61 2.05e- 8

Or what the heck, don’t even control for pID or stim.

Show the code
linear_reg() |>
  fit(RT ~ modality, data = rt_dat) |>
  tidy()
# A tibble: 2 × 5
  term                estimate std.error statistic  p.value
  <chr>                  <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)           1041.       3.00     347.  0       
2 modalityAudiovisual     83.1      4.21      19.7 5.50e-86

  1. Another option is repeated measures ANOVA. However, repeated measures ANOVA can model either participant or item variability, but not both simultaneously. Also, ANOVA requires continuous outcomes and ccategorical predictors.↩︎

  2. The article is a good tutorial and worth reading.↩︎