I will consider three different models from the
sleepstudy
dataset. First off, we need to load two
packages: library(lme4); library(lmerTest)
.
MODEL 1: Ordinary Least Squares:
Syntax:
lm(Reaction ~ Days, sleepstudy)
MODEL 2: Random effects intercepts for each level of
Subject
as they deviate from a global intercept, and a
global slope:
Syntax:
lmer(Reaction ~ Days + (1|Subject), sleepstudy)
MODEL 3: Random intercepts and slopes with
correlation between spread intercepts and slopes:
The continuous variable Days
is treated as a fixed
effect, and its effect on each level of the categorical variable
Subject
, treated as a random effect, is considered
allowing correlation between the spread of the intercepts across
Subjects
and the Days
effect deviations across
Subjects
levels.
Syntax:
lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
,
which is the same as
lmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy)
as
defined in this
entry in Cross Validated.
library(lme4)
library(lmerTest)
fm1 <- lm(Reaction ~ Days, sleepstudy)
summary(fm1)
##
## Call:
## lm(formula = Reaction ~ Days, data = sleepstudy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -110.848 -27.483 1.546 26.142 139.953
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 251.405 6.610 38.033 < 2e-16 ***
## Days 10.467 1.238 8.454 9.89e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 47.71 on 178 degrees of freedom
## Multiple R-squared: 0.2865, Adjusted R-squared: 0.2825
## F-statistic: 71.46 on 1 and 178 DF, p-value: 9.894e-15
coef(fm1)
## (Intercept) Days
## 251.40510 10.46729
anova(fm1)
## Analysis of Variance Table
##
## Response: Reaction
## Df Sum Sq Mean Sq F value Pr(>F)
## Days 1 162703 162703 71.464 9.894e-15 ***
## Residuals 178 405252 2277
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fm2 <- lmer(Reaction ~ Days + (1|Subject), sleepstudy)
summary(fm2)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Reaction ~ Days + (1 | Subject)
## Data: sleepstudy
##
## REML criterion at convergence: 1786.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2257 -0.5529 0.0109 0.5188 4.2506
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 1378.2 37.12
## Residual 960.5 30.99
## Number of obs: 180, groups: Subject, 18
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 251.4051 9.7467 22.8102 25.79 <2e-16 ***
## Days 10.4673 0.8042 161.0000 13.02 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Days -0.371
coef(fm2)
## $Subject
## (Intercept) Days
## 308 292.1888 10.46729
## 309 173.5556 10.46729
## 310 188.2965 10.46729
## 330 255.8115 10.46729
## 331 261.6213 10.46729
## 332 259.6263 10.46729
## 333 267.9056 10.46729
## 334 248.4081 10.46729
## 335 206.1230 10.46729
## 337 323.5878 10.46729
## 349 230.2089 10.46729
## 350 265.5165 10.46729
## 351 243.5429 10.46729
## 352 287.7835 10.46729
## 369 258.4415 10.46729
## 370 245.0424 10.46729
## 371 248.1108 10.46729
## 372 269.5209 10.46729
##
## attr(,"class")
## [1] "coef.mer"
fm3 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
summary(fm3)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Reaction ~ Days + (Days | Subject)
## Data: sleepstudy
##
## REML criterion at convergence: 1743.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.9536 -0.4634 0.0231 0.4634 5.1793
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Subject (Intercept) 612.10 24.741
## Days 35.07 5.922 0.07
## Residual 654.94 25.592
## Number of obs: 180, groups: Subject, 18
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 251.405 6.825 17.000 36.838 < 2e-16 ***
## Days 10.467 1.546 17.000 6.771 3.26e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Days -0.138
coef(fm3)
## $Subject
## (Intercept) Days
## 308 253.6637 19.6662617
## 309 211.0064 1.8476053
## 310 212.4447 5.0184295
## 330 275.0957 5.6529356
## 331 273.6654 7.3973743
## 332 260.4447 10.1951090
## 333 268.2456 10.2436499
## 334 244.1725 11.5418676
## 335 251.0714 -0.2848792
## 337 286.2956 19.0955511
## 349 226.1949 11.6407181
## 350 238.3351 17.0815038
## 351 255.9830 7.4520239
## 352 272.2688 14.0032871
## 369 254.6806 11.3395008
## 370 225.7921 15.2897709
## 371 252.2122 9.4791297
## 372 263.7197 11.7513080
##
## attr(,"class")
## [1] "coef.mer"
To compare different lmer
models itβs best to avoid
REML
when the fixed effects are different between
models. Even though it is not the case in our models I will redefine
the models to steer clear of this potential issue:
fm2 <- lmer(Reaction ~ Days + (1|Subject), REML = F, sleepstudy)
fm3 <- lmer(Reaction ~ Days + (Days|Subject), REML = F, sleepstudy)
The Akaike Information Criteria is a good criterion of the quality of the model. It tends to penalize adding extra predictors (overfitting). The models with the lowest AIC values are best.
AIC(fm1, fm2, fm3)
## df AIC
## fm1 3 1906.293
## fm2 4 1802.079
## fm3 6 1763.939
It seems as though the last model is best in terms of its lowest AIC.
Alternatively we can run ANOVA tests on the models:
anova(fm2, fm3)
## Data: sleepstudy
## Models:
## fm2: Reaction ~ Days + (1 | Subject)
## fm3: Reaction ~ Days + (Days | Subject)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## fm2 4 1802.1 1814.8 -897.04 1794.1
## fm3 6 1763.9 1783.1 -875.97 1751.9 42.139 2 7.072e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.