Regression of [GLUCOSE] ~ [AMINOACID_A] in 30 athletes after 15 races:



All the data is fictitious, and the source code can be found here.


Data set “manufacturing” process:


We’ll make the sd of the slopes between athletes 0.5, and the sd between intercepts of different athletes 0.2. Further we’ll establish a correlation between intercepts and slopes of 0.8.


FOR ATHLETES:


The covariance matrix will be:

i = 0.2 # sd between intercepts
s = 0.5 # sd between slopes
r = 0.8 # correlation between intercepts and slopes
# The higher the intercept of an athlete; the higher the slope - could be the opposite...
(cov.matrix1<-  matrix(c(i^2, r * i * s, r * i * s, s^2), nrow = 2, byrow = T))
##      [,1] [,2]
## [1,] 0.04 0.08
## [2,] 0.08 0.25

Getting a random sample from this bi-variate normal distribution for each athlete:

athletes = 30
require(mvtnorm)
random.effects_athletes <-  rmvnorm(athletes, mean = c(0, 0), sigma = cov.matrix1)

We establish now the mean of these random intercepts (alpha_ath) and slopes (beta_ath):

athletes.df = data.frame(athlete  = c(1:athletes)) # Just a column of numbers from 1 to 30
athletes.df$alpha_athletes = 1 + random.effects_athletes[, 1]
athletes.df$beta_athletes =  2 + random.effects_athletes[, 2]

summary(athletes.df$beta_athletes)  # The mean is the slope in the fixed effects: 2 + random.effects[,2])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.110   1.708   1.889   1.945   2.145   3.163
sd(athletes.df$beta_athletes)       # The sd of the slopes as random effects (we wanted 0.5)
## [1] 0.4513766
summary(athletes.df$alpha_athletes) # The intercept has a mean of 1: (Intercept) fixed eff's
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4948  0.8781  1.0160  1.0040  1.1160  1.3670
sd(athletes.df$alpha_athletes)      # with a SD intercept as random eff (we wanted 0.2)
## [1] 0.2011818
cor(athletes.df$alpha_athletes, athletes.df$beta_athletes) # Set at 0.8.
## [1] 0.7991471


FOR RACES:


The covariance matrix will be:

i =   0.8   # sd between intercepts
s =   0.2   # sd between slopes
r = -0.01   # hardly any relationship between intercepts and slopes
(cov.matrix2 <-  matrix(c(i^2, r * i * s, r * i * s, s^2), nrow = 2, byrow = T))
##         [,1]    [,2]
## [1,]  0.6400 -0.0016
## [2,] -0.0016  0.0400

Getting a random sample from this bi-variate normal distribution for each race:

races = 15
random.effects_races <-  rmvnorm(races, mean = c(0, 0), sigma = cov.matrix2)

We establish now the mean of these random intercepts (alpha_race) and slopes (beta_race):

races.df = data.frame(race  = c(1:races)) # Just a column of numbers from 1 to 15
races.df$alpha_races   =    -1 + random.effects_races[, 1]
races.df$beta_races    =     1 + random.effects_races[, 2]
summary(races.df$beta_races)  # The mean is the slope in the fixed effects: 1 + random.effects[,2])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.7140  0.9655  0.9972  1.0160  1.0630  1.2520
sd(races.df$beta_races)       # The sd of the slopes as random effects (we wanted 0.2)
## [1] 0.1282501
summary(races.df$alpha_races) # The intercept has a mean of - 1: (Intercept) fixed eff's
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -2.141000 -1.646000 -0.985100 -1.080000 -0.591700  0.003778
sd(races.df$alpha_races) # with a SD intercept as random eff (we wanted 0.8)
## [1] 0.6525022
cor(races.df$alpha_races, races.df$beta_races) # Set at -0.01.
## [1] 0.1491423


THE COVARIATE AAA - the fictitious amino acyd:


We need now produce the [AAA] with a mean of 50 and a sd of 5 as:

observations <- athletes * races
observations.df <-  data.frame(
                    athlete = sort(rep(c(1:athletes), races)),
                    race = rep(c(1:races), athletes), 
                    AAA = rep(rnorm(athletes * races, 50, 5)))
dat1   <-  merge(athletes.df, observations.df)
dat2   <-  merge(dat1, races.df)
dat3   <-  dat2[with(dat2, order(athlete,race)), ]
rownames(dat3)   <-  1:nrow(dat3)


THE RESPONSE VARIABLE:


It is time now to calculate the [Glucose] values as:

df <-  within(dat3, 
       glucose <-  alpha_athletes + AAA * beta_athletes +
                   alpha_races    + AAA * beta_races    +
                   0.75 * rnorm(n = observations)) # Epsilon is 0.75

D <- df[,c(2,1,5,8)]
head(D)
##   athlete race      AAA  glucose
## 1       1    1 56.49656 169.5516
## 2       1    2 45.63369 135.0500
## 3       1    3 50.04185 150.7323
## 4       1    4 45.59564 139.9108
## 5       1    5 52.98130 159.0521
## 6       1    6 50.59859 154.3979


THE MODELS:


In quotes are lines extracte from this cross-validated post.

The most common fix effect is labelled as V2, and the order of the model is altered, but without any essential change to the post on CV.

M1: V1 ~ V2 + (1|V3)

This model will estimate: P1: A global intercept P2: Random effect intercepts for V3 (i.e. for each level of V3, that level’s intercept’s deviation from the global intercept) P3: A single global estimate for the effect (slope) of V2

m1=lmer(glucose ~ AAA + (1|athlete), data=D)
summary(m1)
## Linear mixed model fit by REML t-tests use Satterthwaite approximations
##   to degrees of freedom [lmerMod]
## Formula: glucose ~ AAA + (1 | athlete)
##    Data: D
## 
## REML criterion at convergence: 3192.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.9351 -0.4261 -0.0955  0.4137  2.7329 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  athlete  (Intercept) 521.69   22.840  
##  Residual              50.81    7.128  
## Number of obs: 450, groups:  athlete, 30
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)  -4.99227    5.36626  76.30000   -0.93    0.355    
## AAA           3.06203    0.06721 419.20000   45.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##     (Intr)
## AAA -0.626
coefficients(m1) # 30 intercepts: 1 for every athlete; and one single slope for AAA
## $athlete
##    (Intercept)      AAA
## 1    -2.319430 3.062027
## 2    37.252580 3.062027
## 3   -37.369466 3.062027
## 4   -14.498814 3.062027
## 5    58.412704 3.062027
## 6   -16.321969 3.062027
## 7   -16.930180 3.062027
## 8   -14.312161 3.062027
## 9   -22.562873 3.062027
## 10  -29.490204 3.062027
## 11    5.189050 3.062027
## 12   17.922213 3.062027
## 13    9.632048 3.062027
## 14  -12.402902 3.062027
## 15   -8.973970 3.062027
## 16  -17.162565 3.062027
## 17  -20.555254 3.062027
## 18   30.116141 3.062027
## 19   -6.759428 3.062027
## 20   -1.335303 3.062027
## 21   22.288812 3.062027
## 22  -25.822019 3.062027
## 23  -34.820442 3.062027
## 24   14.937800 3.062027
## 25   -2.613498 3.062027
## 26   -9.760009 3.062027
## 27   -5.968219 3.062027
## 28    3.074827 3.062027
## 29   -2.450880 3.062027
## 30  -46.164701 3.062027
## 
## attr(,"class")
## [1] "coef.mer"

M2: V1 ~ V2 + (1|V3) + (0 + V2|V3)

This model estimates all the parameters from M1, but will additionally estimate: P4: The effect of V2 within each level of V3 (more specifically, the degree to which the V2 effect within a given level deviates from the global effect of V2), while enforcing a zero correlation between the intercept deviations and V2 effect deviations across levels of V3.

m2=lmer(glucose ~ AAA + (1|athlete) + (0 + AAA|athlete), data=D)
summary(m2)
## Linear mixed model fit by REML t-tests use Satterthwaite approximations
##   to degrees of freedom [lmerMod]
## Formula: glucose ~ AAA + (1 | athlete) + (0 + AAA | athlete)
##    Data: D
## 
## REML criterion at convergence: 3140.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.6693 -0.4137 -0.1135  0.3026  2.0917 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  athlete   (Intercept)  0.0000  0.0000  
##  athlete.1 AAA          0.2051  0.4528  
##  Residual              44.9159  6.7019  
## Number of obs: 450, groups:  athlete, 30
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  -3.2939     3.1745 419.2000  -1.038      0.3    
## AAA           3.0254     0.1041  70.3000  29.073   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##     (Intr)
## AAA -0.604
coefficients(m2) # One single intercept but 30 different slopes: 1 / athlete
## $athlete
##    (Intercept)      AAA
## 1    -3.293895 3.082104
## 2    -3.293895 3.878285
## 3    -3.293895 2.385910
## 4    -3.293895 2.822765
## 5    -3.293895 4.240351
## 6    -3.293895 2.796344
## 7    -3.293895 2.790783
## 8    -3.293895 2.846261
## 9    -3.293895 2.688807
## 10   -3.293895 2.537121
## 11   -3.293895 3.226337
## 12   -3.293895 3.488651
## 13   -3.293895 3.323610
## 14   -3.293895 2.878070
## 15   -3.293895 2.946938
## 16   -3.293895 2.785408
## 17   -3.293895 2.719229
## 18   -3.293895 3.724945
## 19   -3.293895 2.986893
## 20   -3.293895 3.095998
## 21   -3.293895 3.573158
## 22   -3.293895 2.627459
## 23   -3.293895 2.421532
## 24   -3.293895 3.433305
## 25   -3.293895 3.072523
## 26   -3.293895 2.931712
## 27   -3.293895 3.011902
## 28   -3.293895 3.186349
## 29   -3.293895 3.080520
## 30   -3.293895 2.179237
## 
## attr(,"class")
## [1] "coef.mer"

M3: V1 ~ V2 + (1 + V2|V3)

All parameters from M2 are estimated while allowing correlation between the intercept deviations and V2 effect deviations within levels of V3. Thus, in M3, an additional parameter is estimated: P5: The correlation between intercept deviations and V2 deviations across levels of V3 Usually model pairs like M2 and M3 are computed then compared to evaluate the evidence for correlations between fixed effects (including the global intercept).

m3=lmer(glucose ~ AAA + (1 + AAA|athlete), data=D)
summary(m3)
## Linear mixed model fit by REML t-tests use Satterthwaite approximations
##   to degrees of freedom [lmerMod]
## Formula: glucose ~ AAA + (1 + AAA | athlete)
##    Data: D
## 
## REML criterion at convergence: 3140.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.6706 -0.4145 -0.1149  0.3020  2.1021 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  athlete  (Intercept)  0.09227 0.3038       
##           AAA          0.19970 0.4469   1.00
##  Residual             44.91406 6.7018       
## Number of obs: 450, groups:  athlete, 30
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  -3.3141     3.1749 416.5000  -1.044    0.297    
## AAA           3.0259     0.1032  35.8000  29.320   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##     (Intr)
## AAA -0.595
coefficients(m3) # 30 intecepts and 30 different slopes
## $athlete
##    (Intercept)      AAA
## 1    -3.276090 3.081744
## 2    -2.742140 3.867291
## 3    -3.743063 2.394734
## 4    -3.449869 2.826080
## 5    -2.498761 4.225350
## 6    -3.467696 2.799854
## 7    -3.471475 2.794295
## 8    -3.434300 2.848986
## 9    -3.539972 2.693521
## 10   -3.641603 2.544002
## 11   -3.179291 3.224154
## 12   -3.003426 3.482888
## 13   -3.114138 3.320008
## 14   -3.412910 2.880454
## 15   -3.366726 2.948401
## 16   -3.475083 2.788986
## 17   -3.519466 2.723690
## 18   -2.844910 3.716095
## 19   -3.339890 2.987882
## 20   -3.266720 3.095530
## 21   -2.946703 3.566338
## 22   -3.581149 2.632942
## 23   -3.719036 2.430082
## 24   -3.040553 3.428265
## 25   -3.282482 3.072340
## 26   -3.376939 2.933375
## 27   -3.323200 3.012436
## 28   -3.206140 3.184655
## 29   -3.277162 3.080167
## 30   -3.881459 2.191127
## 
## attr(,"class")
## [1] "coef.mer"

M4: V1 ~ V2 * V4 + (1 + V2 * V4|V3)

This model estimates: P1: A global intercept P2: A single global estimate for the effect of V2 P3: A single global estimate for the effect of V4 P4: A single global estimate for the interaction between V2 and V4 P5: Deviations of the intercept from P1 in each level of V3 P6: Deviations of the V2 effect from P2 in each level of V3 P7: Deviations of the V4 effect from P3 in each level of V3 P8: Deviations of the V2-by-V4 interaction from P4 in each level of V3 P9 Correlation between P5 and P6 across levels of V3 P10 Correlation between P5 and P7 across levels of V3 P11 Correlation between P5 and P8 across levels of V3 P12 Correlation between P6 and P7 across levels of V3 P13 Correlation between P6 and P8 across levels of V3 P14 Correlation between P7 and P8 across levels of V3

m4=lmer(glucose ~ AAA * race + (1 + AAA * race|athlete), data=D) 
summary(m4)
## Linear mixed model fit by REML t-tests use Satterthwaite approximations
##   to degrees of freedom [lmerMod]
## Formula: glucose ~ AAA * race + (1 + AAA * race | athlete)
##    Data: D
## 
## REML criterion at convergence: 3150.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.69391 -0.40060 -0.09759  0.30269  2.03892 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev. Corr             
##  athlete  (Intercept) 7.542e+00 2.74625                   
##           AAA         1.977e-01 0.44469   0.17            
##           race        2.879e-01 0.53661   0.59 -0.07      
##           AAA:race    1.174e-04 0.01083  -0.61  0.02 -1.00
##  Residual             4.468e+01 6.68432                   
## Number of obs: 450, groups:  athlete, 30
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)   0.67067    6.65121 255.80000   0.101    0.920    
## AAA           2.94351    0.15432  50.79000  19.074   <2e-16 ***
## race         -0.54680    0.75250 267.09000  -0.727    0.468    
## AAA:race      0.01129    0.01493 266.86000   0.756    0.450    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##          (Intr) AAA    race  
## AAA      -0.837              
## race     -0.860  0.731       
## AAA:race  0.854 -0.736 -0.995
## convergence code: 1
## Model failed to converge with max|grad| = 6.2989 (tol = 0.002, component 1)
## Model is nearly unidentifiable: very large eigenvalue
##  - Rescale variables?
## Model is nearly unidentifiable: large eigenvalue ratio
##  - Rescale variables?
# 30 intercepts (30 inter'pts from athletes; 30 from race|athlete);
# 30 slopes for AAA; 30 for race
# And the interaction slopes of the AAA*race interaction.
coefficients(m4) 
## $athlete
##    (Intercept)      AAA       race    AAA:race
## 1   0.67350952 3.001104 -0.5619209 0.011537612
## 2   0.98141793 3.797062 -0.8189417 0.015859777
## 3  -1.03342536 2.329744 -0.7032117 0.015154365
## 4   0.39579922 2.744444 -0.5191579 0.010958395
## 5   2.68762641 4.129626 -0.5414958 0.009859443
## 6  -0.27086360 2.734240 -0.6742182 0.014115919
## 7   1.26781684 2.695392 -0.3087022 0.006739131
## 8  -0.02602580 2.775201 -0.6340597 0.013254880
## 9  -1.27375054 2.638364 -0.8861213 0.018520012
## 10 -0.16741723 2.466264 -0.5535008 0.011963599
## 11  2.19985466 3.116307 -0.2653085 0.005385617
## 12  1.20599623 3.400329 -0.6181537 0.012226720
## 13 -0.23255414 3.260251 -0.8905918 0.017914468
## 14  0.95757435 2.789806 -0.4288098 0.009068588
## 15  0.58425206 2.867968 -0.5384250 0.011207763
## 16  0.36047141 2.707646 -0.5269082 0.011153517
## 17 -0.04207279 2.649734 -0.5936517 0.012573902
## 18  2.04291637 3.622833 -0.4971902 0.009526031
## 19  1.73592175 2.884194 -0.2844141 0.006032478
## 20  1.89837292 2.990543 -0.2878545 0.005983461
## 21  1.89819660 3.472487 -0.4996661 0.009737813
## 22 -0.62707983 2.563712 -0.6876220 0.014579320
## 23 -0.87773990 2.365955 -0.6678531 0.014399420
## 24  1.70594197 3.336144 -0.4612960 0.009118651
## 25  1.41061214 2.974493 -0.4015064 0.008306513
## 26  1.17737311 2.840116 -0.3873493 0.008173247
## 27 -0.21937409 2.946538 -0.7477172 0.015369639
## 28  1.42778995 3.090313 -0.4037013 0.008230887
## 29  0.51674534 3.003444 -0.5923377 0.012153992
## 30 -0.23767489 2.110916 -0.4223239 0.009700692
## 
## attr(,"class")
## [1] "coef.mer"


References:

Hierarchical linear models and lmer.

R’s lmer cheat-sheet.


Home Page