NOTES ON STATISTICS, PROBABILITY and MATHEMATICS


Mixed-Effects Models comparisons:


I will consider three different models from the sleepstudy dataset. First off, we need to load two packages: library(lme4); library(lmerTest).



MODEL 1: Ordinary Least Squares:

Syntax:

lm(Reaction ~ Days, sleepstudy)



MODEL 2: Random effects intercepts for each level of Subject as they deviate from a global intercept, and a global slope:

Syntax:

lmer(Reaction ~ Days + (1|Subject), sleepstudy)



MODEL 3: Random intercepts and slopes with correlation between spread intercepts and slopes:

The continuous variable Days is treated as a fixed effect, and its effect on each level of the categorical variable Subject, treated as a random effect, is considered allowing correlation between the spread of the intercepts across Subjects and the Days effect deviations across Subjects levels.

Syntax:

lmer(Reaction ~ Days + (Days|Subject), sleepstudy), which is the same as lmer(Reaction ~ Days + (1 + Days|Subject), sleepstudy) as defined in this entry in Cross Validated.


Retrieving Results:


library(lme4)
library(lmerTest)
fm1 <- lm(Reaction ~ Days, sleepstudy)
summary(fm1)
## 
## Call:
## lm(formula = Reaction ~ Days, data = sleepstudy)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -110.848  -27.483    1.546   26.142  139.953 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  251.405      6.610  38.033  < 2e-16 ***
## Days          10.467      1.238   8.454 9.89e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 47.71 on 178 degrees of freedom
## Multiple R-squared:  0.2865, Adjusted R-squared:  0.2825 
## F-statistic: 71.46 on 1 and 178 DF,  p-value: 9.894e-15
coef(fm1)
## (Intercept)        Days 
##   251.40510    10.46729
anova(fm1)
## Analysis of Variance Table
## 
## Response: Reaction
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## Days        1 162703  162703  71.464 9.894e-15 ***
## Residuals 178 405252    2277                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fm2 <- lmer(Reaction ~ Days + (1|Subject), sleepstudy)
summary(fm2)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Reaction ~ Days + (1 | Subject)
##    Data: sleepstudy
## 
## REML criterion at convergence: 1786.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2257 -0.5529  0.0109  0.5188  4.2506 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Subject  (Intercept) 1378.2   37.12   
##  Residual              960.5   30.99   
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept) 251.4051     9.7467  22.8102   25.79   <2e-16 ***
## Days         10.4673     0.8042 161.0000   13.02   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.371
coef(fm2)
## $Subject
##     (Intercept)     Days
## 308    292.1888 10.46729
## 309    173.5556 10.46729
## 310    188.2965 10.46729
## 330    255.8115 10.46729
## 331    261.6213 10.46729
## 332    259.6263 10.46729
## 333    267.9056 10.46729
## 334    248.4081 10.46729
## 335    206.1230 10.46729
## 337    323.5878 10.46729
## 349    230.2089 10.46729
## 350    265.5165 10.46729
## 351    243.5429 10.46729
## 352    287.7835 10.46729
## 369    258.4415 10.46729
## 370    245.0424 10.46729
## 371    248.1108 10.46729
## 372    269.5209 10.46729
## 
## attr(,"class")
## [1] "coef.mer"
fm3 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
summary(fm3)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Reaction ~ Days + (Days | Subject)
##    Data: sleepstudy
## 
## REML criterion at convergence: 1743.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9536 -0.4634  0.0231  0.4634  5.1793 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  Subject  (Intercept) 612.10   24.741       
##           Days         35.07    5.922   0.07
##  Residual             654.94   25.592       
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)  251.405      6.825  17.000  36.838  < 2e-16 ***
## Days          10.467      1.546  17.000   6.771 3.26e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.138
coef(fm3)
## $Subject
##     (Intercept)       Days
## 308    253.6637 19.6662617
## 309    211.0064  1.8476053
## 310    212.4447  5.0184295
## 330    275.0957  5.6529356
## 331    273.6654  7.3973743
## 332    260.4447 10.1951090
## 333    268.2456 10.2436499
## 334    244.1725 11.5418676
## 335    251.0714 -0.2848792
## 337    286.2956 19.0955511
## 349    226.1949 11.6407181
## 350    238.3351 17.0815038
## 351    255.9830  7.4520239
## 352    272.2688 14.0032871
## 369    254.6806 11.3395008
## 370    225.7921 15.2897709
## 371    252.2122  9.4791297
## 372    263.7197 11.7513080
## 
## attr(,"class")
## [1] "coef.mer"



MODEL SELECTION:



To compare different lmer models it’s best to avoid REML when the fixed effects are different between models. Even though it is not the case in our models I will redefine the models to steer clear of this potential issue:

fm2 <- lmer(Reaction ~ Days + (1|Subject), REML = F, sleepstudy)
fm3 <- lmer(Reaction ~ Days + (Days|Subject), REML = F, sleepstudy)

The Akaike Information Criteria is a good criterion of the quality of the model. It tends to penalize adding extra predictors (overfitting). The models with the lowest AIC values are best.

AIC(fm1, fm2, fm3)
##     df      AIC
## fm1  3 1906.293
## fm2  4 1802.079
## fm3  6 1763.939

It seems as though the last model is best in terms of its lowest AIC.

Alternatively we can run ANOVA tests on the models:

anova(fm2, fm3)
## Data: sleepstudy
## Models:
## fm2: Reaction ~ Days + (1 | Subject)
## fm3: Reaction ~ Days + (Days | Subject)
##     npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## fm2    4 1802.1 1814.8 -897.04   1794.1                         
## fm3    6 1763.9 1783.1 -875.97   1751.9 42.139  2  7.072e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Home Page

NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.