NOTES ON STATISTICS, PROBABILITY and MATHEMATICS


Random Effects Simulation:


Regression of [GLUCOSE] ~ [AMINOACID_A] in 30 athletes after 15 races:



All the data is fictitious, and the source code can be found here.


Data set “manufacturing” process:


We’ll make the sd of the slopes between athletes 0.5, and the sd between intercepts of different athletes 0.2. Further we’ll establish a correlation between intercepts and slopes of 0.8.


FOR ATHLETES:


The covariance matrix will be:

i = 0.2 # sd between intercepts
s = 0.5 # sd between slopes
r = 0.8 # correlation between intercepts and slopes
# The higher the intercept of an athlete; the higher the slope - could be the opposite...
(cov.matrix1<-  matrix(c(i^2, r * i * s, r * i * s, s^2), nrow = 2, byrow = T))
##      [,1] [,2]
## [1,] 0.04 0.08
## [2,] 0.08 0.25

Getting a random sample from this bi-variate normal distribution for each athlete:

athletes = 30
require(mvtnorm)
random.effects_athletes <-  rmvnorm(athletes, mean = c(0, 0), sigma = cov.matrix1)

We establish now the mean of these random intercepts (alpha_ath) and slopes (beta_ath):

athletes.df = data.frame(athlete  = c(1:athletes)) # Just a column of numbers from 1 to 30
athletes.df$alpha_athletes = 1 + random.effects_athletes[, 1]
athletes.df$beta_athletes =  2 + random.effects_athletes[, 2]

summary(athletes.df$beta_athletes)  # The mean is the slope in the fixed effects: 2 + random.effects[,2])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.110   1.708   1.889   1.945   2.145   3.163
sd(athletes.df$beta_athletes)       # The sd of the slopes as random effects (we wanted 0.5)
## [1] 0.4513766
summary(athletes.df$alpha_athletes) # The intercept has a mean of 1: (Intercept) fixed eff's
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4948  0.8781  1.0158  1.0042  1.1164  1.3667
sd(athletes.df$alpha_athletes)      # with a SD intercept as random eff (we wanted 0.2)
## [1] 0.2011818
cor(athletes.df$alpha_athletes, athletes.df$beta_athletes) # Set at 0.8.
## [1] 0.7991471


FOR RACES:


The covariance matrix will be:

i =   0.8   # sd between intercepts
s =   0.2   # sd between slopes
r = -0.01   # hardly any relationship between intercepts and slopes
(cov.matrix2 <-  matrix(c(i^2, r * i * s, r * i * s, s^2), nrow = 2, byrow = T))
##         [,1]    [,2]
## [1,]  0.6400 -0.0016
## [2,] -0.0016  0.0400

Getting a random sample from this bi-variate normal distribution for each race:

races = 15
random.effects_races <-  rmvnorm(races, mean = c(0, 0), sigma = cov.matrix2)

We establish now the mean of these random intercepts (alpha_race) and slopes (beta_race):

races.df = data.frame(race  = c(1:races)) # Just a column of numbers from 1 to 15
races.df$alpha_races   =    -1 + random.effects_races[, 1]
races.df$beta_races    =     1 + random.effects_races[, 2]
summary(races.df$beta_races)  # The mean is the slope in the fixed effects: 1 + random.effects[,2])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.7140  0.9655  0.9972  1.0164  1.0635  1.2521
sd(races.df$beta_races)       # The sd of the slopes as random effects (we wanted 0.2)
## [1] 0.1282501
summary(races.df$alpha_races) # The intercept has a mean of - 1: (Intercept) fixed eff's
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -2.140662 -1.646019 -0.985087 -1.080361 -0.591727  0.003778
sd(races.df$alpha_races) # with a SD intercept as random eff (we wanted 0.8)
## [1] 0.6525022
cor(races.df$alpha_races, races.df$beta_races) # Set at -0.01.
## [1] 0.1491423


THE COVARIATE AAA - the fictitious aminoacid:


We need now produce the [AAA] with a mean of 50 and a sd of 5 as:

observations <- athletes * races
observations.df <-  data.frame(
                    athlete = sort(rep(c(1:athletes), races)),
                    race = rep(c(1:races), athletes), 
                    AAA = rep(rnorm(athletes * races, 50, 5)))
dat1   <-  merge(athletes.df, observations.df)
dat2   <-  merge(dat1, races.df)
dat3   <-  dat2[with(dat2, order(athlete,race)), ]
rownames(dat3)   <-  1:nrow(dat3)


THE RESPONSE VARIABLE:


It is time now to calculate the [Glucose] values as:

df <-  within(dat3, 
       glucose <-  alpha_athletes + AAA * beta_athletes +
                   alpha_races    + AAA * beta_races    +
                   0.75 * rnorm(n = observations)) # Epsilon is 0.75

D <- df[,c(2,1,5,8)]
head(D)
##   athlete race      AAA  glucose
## 1       1    1 56.49656 169.5516
## 2       1    2 45.63369 135.0500
## 3       1    3 50.04185 150.7323
## 4       1    4 45.59564 139.9108
## 5       1    5 52.98130 159.0521
## 6       1    6 50.59859 154.3979


THE MODELS:


In cursive are lines extracted from this cross-validated post.

The most common fix effect is labelled as V2, and the order of the model is altered, but without any essential change to the post on CV.

M1: V1 ~ V2 + (1|V3)

This model will estimate: P1: A global intercept P2: Random effect intercepts for V3 (i.e. for each level of V3, that level’s intercept’s deviation from the global intercept) P3: A single global estimate for the effect (slope) of V2

m1=lmer(glucose ~ AAA + (1|athlete), data=D)
summary(m1)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: glucose ~ AAA + (1 | athlete)
##    Data: D
## 
## REML criterion at convergence: 3192.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.9351 -0.4261 -0.0955  0.4137  2.7329 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  athlete  (Intercept) 521.69   22.840  
##  Residual              50.81    7.128  
## Number of obs: 450, groups:  athlete, 30
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)  -4.99227    5.36626  76.27588   -0.93    0.355    
## AAA           3.06203    0.06721 419.22687   45.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##     (Intr)
## AAA -0.626
coefficients(m1) # 30 intercepts: 1 for every athlete; and one single slope for AAA
## $athlete
##    (Intercept)      AAA
## 1    -2.319430 3.062027
## 2    37.252580 3.062027
## 3   -37.369466 3.062027
## 4   -14.498814 3.062027
## 5    58.412704 3.062027
## 6   -16.321969 3.062027
## 7   -16.930180 3.062027
## 8   -14.312161 3.062027
## 9   -22.562873 3.062027
## 10  -29.490204 3.062027
## 11    5.189050 3.062027
## 12   17.922213 3.062027
## 13    9.632048 3.062027
## 14  -12.402902 3.062027
## 15   -8.973970 3.062027
## 16  -17.162565 3.062027
## 17  -20.555254 3.062027
## 18   30.116141 3.062027
## 19   -6.759428 3.062027
## 20   -1.335303 3.062027
## 21   22.288812 3.062027
## 22  -25.822019 3.062027
## 23  -34.820442 3.062027
## 24   14.937800 3.062027
## 25   -2.613498 3.062027
## 26   -9.760009 3.062027
## 27   -5.968219 3.062027
## 28    3.074827 3.062027
## 29   -2.450880 3.062027
## 30  -46.164701 3.062027
## 
## attr(,"class")
## [1] "coef.mer"

M2: V1 ~ V2 + (1|V3) + (0 + V2|V3)

This model estimates all the parameters from M1, but will additionally estimate: P4: The effect of V2 within each level of V3 (more specifically, the degree to which the V2 effect within a given level deviates from the global effect of V2), while enforcing a zero correlation between the intercept deviations and V2 effect deviations across levels of V3.

m2=lmer(glucose ~ AAA + (1|athlete) + (0 + AAA|athlete), data=D)
## boundary (singular) fit: see help('isSingular')
summary(m2)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: glucose ~ AAA + (1 | athlete) + (0 + AAA | athlete)
##    Data: D
## 
## REML criterion at convergence: 3140.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.6693 -0.4137 -0.1135  0.3026  2.0917 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  athlete   (Intercept)  0.0000  0.0000  
##  athlete.1 AAA          0.2051  0.4528  
##  Residual              44.9159  6.7019  
## Number of obs: 450, groups:  athlete, 30
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  -3.2939     3.1745 419.2109  -1.038      0.3    
## AAA           3.0254     0.1041  70.3320  29.073   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##     (Intr)
## AAA -0.604
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
coefficients(m2) # One single intercept but 30 different slopes: 1 / athlete
## $athlete
##    (Intercept)      AAA
## 1    -3.293895 3.082104
## 2    -3.293895 3.878285
## 3    -3.293895 2.385910
## 4    -3.293895 2.822765
## 5    -3.293895 4.240351
## 6    -3.293895 2.796344
## 7    -3.293895 2.790783
## 8    -3.293895 2.846261
## 9    -3.293895 2.688807
## 10   -3.293895 2.537121
## 11   -3.293895 3.226337
## 12   -3.293895 3.488651
## 13   -3.293895 3.323610
## 14   -3.293895 2.878070
## 15   -3.293895 2.946938
## 16   -3.293895 2.785408
## 17   -3.293895 2.719229
## 18   -3.293895 3.724945
## 19   -3.293895 2.986893
## 20   -3.293895 3.095998
## 21   -3.293895 3.573158
## 22   -3.293895 2.627459
## 23   -3.293895 2.421532
## 24   -3.293895 3.433305
## 25   -3.293895 3.072523
## 26   -3.293895 2.931712
## 27   -3.293895 3.011902
## 28   -3.293895 3.186349
## 29   -3.293895 3.080520
## 30   -3.293895 2.179237
## 
## attr(,"class")
## [1] "coef.mer"

M3: V1 ~ V2 + (1 + V2|V3)

All parameters from M2 are estimated while allowing correlation between the intercept deviations and V2 effect deviations within levels of V3. Thus, in M3, an additional parameter is estimated: P5: The correlation between intercept deviations and V2 deviations across levels of V3 Usually model pairs like M2 and M3 are computed then compared to evaluate the evidence for correlations between fixed effects (including the global intercept).

m3=lmer(glucose ~ AAA + (1 + AAA|athlete), data=D)
## boundary (singular) fit: see help('isSingular')
summary(m3)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: glucose ~ AAA + (1 + AAA | athlete)
##    Data: D
## 
## REML criterion at convergence: 3140.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.6693 -0.4137 -0.1135  0.3026  2.0917 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  athlete  (Intercept)  0.0000  0.0000       
##           AAA          0.2051  0.4528    NaN
##  Residual             44.9159  6.7019       
## Number of obs: 450, groups:  athlete, 30
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  -3.2939     3.1745 419.1847  -1.038      0.3    
## AAA           3.0254     0.1041  33.5662  29.073   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##     (Intr)
## AAA -0.604
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
coefficients(m3) # 30 intecepts and 30 different slopes
## $athlete
##    (Intercept)      AAA
## 1    -3.293893 3.082104
## 2    -3.293893 3.878285
## 3    -3.293893 2.385909
## 4    -3.293893 2.822765
## 5    -3.293893 4.240352
## 6    -3.293893 2.796344
## 7    -3.293893 2.790783
## 8    -3.293893 2.846261
## 9    -3.293893 2.688807
## 10   -3.293893 2.537121
## 11   -3.293893 3.226337
## 12   -3.293893 3.488652
## 13   -3.293893 3.323610
## 14   -3.293893 2.878070
## 15   -3.293893 2.946938
## 16   -3.293893 2.785408
## 17   -3.293893 2.719229
## 18   -3.293893 3.724945
## 19   -3.293893 2.986892
## 20   -3.293893 3.095998
## 21   -3.293893 3.573158
## 22   -3.293893 2.627459
## 23   -3.293893 2.421532
## 24   -3.293893 3.433305
## 25   -3.293893 3.072523
## 26   -3.293893 2.931712
## 27   -3.293893 3.011902
## 28   -3.293893 3.186349
## 29   -3.293893 3.080520
## 30   -3.293893 2.179237
## 
## attr(,"class")
## [1] "coef.mer"

M4: V1 ~ V2 * V4 + (1 + V2 * V4|V3)

This model estimates: P1: A global intercept P2: A single global estimate for the effect of V2 P3: A single global estimate for the effect of V4 P4: A single global estimate for the interaction between V2 and V4 P5: Deviations of the intercept from P1 in each level of V3 P6: Deviations of the V2 effect from P2 in each level of V3 P7: Deviations of the V4 effect from P3 in each level of V3 P8: Deviations of the V2-by-V4 interaction from P4 in each level of V3 P9 Correlation between P5 and P6 across levels of V3 P10 Correlation between P5 and P7 across levels of V3 P11 Correlation between P5 and P8 across levels of V3 P12 Correlation between P6 and P7 across levels of V3 P13 Correlation between P6 and P8 across levels of V3 P14 Correlation between P7 and P8 across levels of V3

m4=lmer(glucose ~ AAA * race + (1 + AAA * race|athlete), data=D) 
## boundary (singular) fit: see help('isSingular')
summary(m4)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: glucose ~ AAA * race + (1 + AAA * race | athlete)
##    Data: D
## 
## REML criterion at convergence: 3195
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.86948 -0.41399 -0.07131  0.36357  2.27924 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr             
##  athlete  (Intercept) 31.23248 5.5886                    
##           AAA          0.22329 0.4725    0.15            
##           race        44.54034 6.6739   -0.01  0.10      
##           AAA:race     0.01745 0.1321    0.01 -0.10 -1.00
##  Residual             41.52999 6.4444                    
## Number of obs: 450, groups:  athlete, 30
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)   2.04615    6.58514 150.91763   0.311    0.756    
## AAA           2.91422    0.15497 128.99373  18.805   <2e-16 ***
## race         -0.87312    1.43362 306.80517  -0.609    0.543    
## AAA:race      0.01805    0.02840 245.71275   0.635    0.526    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##          (Intr) AAA    race  
## AAA      -0.803              
## race     -0.452  0.422       
## AAA:race  0.449 -0.425 -0.999
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
# 30 intercepts (30 inter'pts from athletes; 30 from race|athlete);
# 30 slopes for AAA; 30 for race
# And the interaction slopes of the AAA*race interaction.
coefficients(m4) 
## $athlete
##    (Intercept)      AAA        race      AAA:race
## 1    1.9767682 2.973291 -0.80056675  0.0165666867
## 2    5.5187591 3.709103 -4.72173719  0.0935066171
## 3   -0.1562702 2.307275 -2.64674741  0.0536808021
## 4    0.9132622 2.725112  0.01885581  0.0005705351
## 5    6.1547881 4.060662 -0.52124891  0.0100822873
## 6    0.8796833 2.720062 -1.99049771  0.0403409923
## 7    1.4982774 2.699735  2.22757629 -0.0431317235
## 8    1.0725843 2.748798 -2.01663693  0.0408314074
## 9    0.9142552 2.583552 -4.19176384  0.0840139344
## 10   0.3014324 2.454717 -0.80722907  0.0171485645
## 11   2.8246422 3.106519  1.61844703 -0.0314301160
## 12   4.2185349 3.339082 -1.26262169  0.0253727875
## 13   3.4190398 3.183039 -5.64081017  0.1121596931
## 14   1.6902255 2.772324  1.45639910 -0.0279320479
## 15   1.7952148 2.842018 -0.63158534  0.0133316832
## 16   1.2079677 2.688746 -0.47821382  0.0104312201
## 17   0.8433720 2.634029 -2.19479846  0.0444541179
## 18   4.6163419 3.568123  0.13711634 -0.0025205625
## 19   2.1586932 2.863903  2.20952995 -0.0429195844
## 20   2.7549928 2.978874  1.54267442 -0.0298264632
## 21   4.8848944 3.410694 -0.58867977  0.0119646354
## 22   0.1561311 2.534604 -2.60443689  0.0526544125
## 23  -0.5623365 2.362343 -1.71135899  0.0351327691
## 24   3.4984027 3.296454 -0.28373071  0.0060485778
## 25   2.7633485 2.943849  0.28424764 -0.0048937747
## 26   1.8186279 2.824920  1.40944140 -0.0270475745
## 27   1.8057243 2.907187 -3.63506360  0.0727188089
## 28   1.8521616 3.085593  0.70271668 -0.0132717180
## 29   1.7451423 2.978966 -1.25556399  0.0255711753
## 30  -1.1800810 2.123068  0.18278218 -0.0021482613
## 
## attr(,"class")
## [1] "coef.mer"


References:

Hierarchical linear models and lmer.

R’s lmer cheat-sheet.


Home Page

NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.