[GLUCOSE] ~ [AMINOACID_A]
in 30 athletes
after 15 races:All the data is fictitious, and the source code can be found here.
We’ll make the sd
of the slopes between
athletes 0.5
, and the sd
between
intercepts of different athletes 0.2
. Further
we’ll establish a correlation between intercepts and slopes of
0.8
.
The covariance matrix will be:
i = 0.2 # sd between intercepts
s = 0.5 # sd between slopes
r = 0.8 # correlation between intercepts and slopes
# The higher the intercept of an athlete; the higher the slope - could be the opposite...
(cov.matrix1<- matrix(c(i^2, r * i * s, r * i * s, s^2), nrow = 2, byrow = T))
## [,1] [,2]
## [1,] 0.04 0.08
## [2,] 0.08 0.25
Getting a random sample from this bi-variate normal distribution for each athlete:
athletes = 30
require(mvtnorm)
random.effects_athletes <- rmvnorm(athletes, mean = c(0, 0), sigma = cov.matrix1)
We establish now the mean of these random intercepts
(alpha_ath
) and slopes (beta_ath
):
athletes.df = data.frame(athlete = c(1:athletes)) # Just a column of numbers from 1 to 30
athletes.df$alpha_athletes = 1 + random.effects_athletes[, 1]
athletes.df$beta_athletes = 2 + random.effects_athletes[, 2]
summary(athletes.df$beta_athletes) # The mean is the slope in the fixed effects: 2 + random.effects[,2])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.110 1.708 1.889 1.945 2.145 3.163
sd(athletes.df$beta_athletes) # The sd of the slopes as random effects (we wanted 0.5)
## [1] 0.4513766
summary(athletes.df$alpha_athletes) # The intercept has a mean of 1: (Intercept) fixed eff's
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4948 0.8781 1.0158 1.0042 1.1164 1.3667
sd(athletes.df$alpha_athletes) # with a SD intercept as random eff (we wanted 0.2)
## [1] 0.2011818
cor(athletes.df$alpha_athletes, athletes.df$beta_athletes) # Set at 0.8.
## [1] 0.7991471
The covariance matrix will be:
i = 0.8 # sd between intercepts
s = 0.2 # sd between slopes
r = -0.01 # hardly any relationship between intercepts and slopes
(cov.matrix2 <- matrix(c(i^2, r * i * s, r * i * s, s^2), nrow = 2, byrow = T))
## [,1] [,2]
## [1,] 0.6400 -0.0016
## [2,] -0.0016 0.0400
Getting a random sample from this bi-variate normal distribution for each race:
races = 15
random.effects_races <- rmvnorm(races, mean = c(0, 0), sigma = cov.matrix2)
We establish now the mean of these random intercepts
(alpha_race
) and slopes (beta_race
):
races.df = data.frame(race = c(1:races)) # Just a column of numbers from 1 to 15
races.df$alpha_races = -1 + random.effects_races[, 1]
races.df$beta_races = 1 + random.effects_races[, 2]
summary(races.df$beta_races) # The mean is the slope in the fixed effects: 1 + random.effects[,2])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.7140 0.9655 0.9972 1.0164 1.0635 1.2521
sd(races.df$beta_races) # The sd of the slopes as random effects (we wanted 0.2)
## [1] 0.1282501
summary(races.df$alpha_races) # The intercept has a mean of - 1: (Intercept) fixed eff's
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.140662 -1.646019 -0.985087 -1.080361 -0.591727 0.003778
sd(races.df$alpha_races) # with a SD intercept as random eff (we wanted 0.8)
## [1] 0.6525022
cor(races.df$alpha_races, races.df$beta_races) # Set at -0.01.
## [1] 0.1491423
AAA
- the fictitious aminoacid:We need now produce the [AAA]
with a mean of
50
and a sd of 5
as:
observations <- athletes * races
observations.df <- data.frame(
athlete = sort(rep(c(1:athletes), races)),
race = rep(c(1:races), athletes),
AAA = rep(rnorm(athletes * races, 50, 5)))
dat1 <- merge(athletes.df, observations.df)
dat2 <- merge(dat1, races.df)
dat3 <- dat2[with(dat2, order(athlete,race)), ]
rownames(dat3) <- 1:nrow(dat3)
It is time now to calculate the [Glucose]
values as:
df <- within(dat3,
glucose <- alpha_athletes + AAA * beta_athletes +
alpha_races + AAA * beta_races +
0.75 * rnorm(n = observations)) # Epsilon is 0.75
D <- df[,c(2,1,5,8)]
head(D)
## athlete race AAA glucose
## 1 1 1 56.49656 169.5516
## 2 1 2 45.63369 135.0500
## 3 1 3 50.04185 150.7323
## 4 1 4 45.59564 139.9108
## 5 1 5 52.98130 159.0521
## 6 1 6 50.59859 154.3979
In cursive are lines extracted from this cross-validated post.
The most common fix effect is labelled as V2, and the order of the model is altered, but without any essential change to the post on CV.
M1: V1 ~ V2 + (1|V3)
This model will estimate: P1: A global intercept P2: Random effect intercepts for V3 (i.e. for each level of V3, that level’s intercept’s deviation from the global intercept) P3: A single global estimate for the effect (slope) of V2
m1=lmer(glucose ~ AAA + (1|athlete), data=D)
summary(m1)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: glucose ~ AAA + (1 | athlete)
## Data: D
##
## REML criterion at convergence: 3192.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.9351 -0.4261 -0.0955 0.4137 2.7329
##
## Random effects:
## Groups Name Variance Std.Dev.
## athlete (Intercept) 521.69 22.840
## Residual 50.81 7.128
## Number of obs: 450, groups: athlete, 30
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) -4.99227 5.36626 76.27588 -0.93 0.355
## AAA 3.06203 0.06721 419.22687 45.56 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## AAA -0.626
coefficients(m1) # 30 intercepts: 1 for every athlete; and one single slope for AAA
## $athlete
## (Intercept) AAA
## 1 -2.319430 3.062027
## 2 37.252580 3.062027
## 3 -37.369466 3.062027
## 4 -14.498814 3.062027
## 5 58.412704 3.062027
## 6 -16.321969 3.062027
## 7 -16.930180 3.062027
## 8 -14.312161 3.062027
## 9 -22.562873 3.062027
## 10 -29.490204 3.062027
## 11 5.189050 3.062027
## 12 17.922213 3.062027
## 13 9.632048 3.062027
## 14 -12.402902 3.062027
## 15 -8.973970 3.062027
## 16 -17.162565 3.062027
## 17 -20.555254 3.062027
## 18 30.116141 3.062027
## 19 -6.759428 3.062027
## 20 -1.335303 3.062027
## 21 22.288812 3.062027
## 22 -25.822019 3.062027
## 23 -34.820442 3.062027
## 24 14.937800 3.062027
## 25 -2.613498 3.062027
## 26 -9.760009 3.062027
## 27 -5.968219 3.062027
## 28 3.074827 3.062027
## 29 -2.450880 3.062027
## 30 -46.164701 3.062027
##
## attr(,"class")
## [1] "coef.mer"
M2: V1 ~ V2 + (1|V3) + (0 + V2|V3)
This model estimates all the parameters from M1, but will additionally estimate: P4: The effect of V2 within each level of V3 (more specifically, the degree to which the V2 effect within a given level deviates from the global effect of V2), while enforcing a zero correlation between the intercept deviations and V2 effect deviations across levels of V3.
m2=lmer(glucose ~ AAA + (1|athlete) + (0 + AAA|athlete), data=D)
## boundary (singular) fit: see help('isSingular')
summary(m2)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: glucose ~ AAA + (1 | athlete) + (0 + AAA | athlete)
## Data: D
##
## REML criterion at convergence: 3140.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.6693 -0.4137 -0.1135 0.3026 2.0917
##
## Random effects:
## Groups Name Variance Std.Dev.
## athlete (Intercept) 0.0000 0.0000
## athlete.1 AAA 0.2051 0.4528
## Residual 44.9159 6.7019
## Number of obs: 450, groups: athlete, 30
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) -3.2939 3.1745 419.2109 -1.038 0.3
## AAA 3.0254 0.1041 70.3320 29.073 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## AAA -0.604
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
coefficients(m2) # One single intercept but 30 different slopes: 1 / athlete
## $athlete
## (Intercept) AAA
## 1 -3.293895 3.082104
## 2 -3.293895 3.878285
## 3 -3.293895 2.385910
## 4 -3.293895 2.822765
## 5 -3.293895 4.240351
## 6 -3.293895 2.796344
## 7 -3.293895 2.790783
## 8 -3.293895 2.846261
## 9 -3.293895 2.688807
## 10 -3.293895 2.537121
## 11 -3.293895 3.226337
## 12 -3.293895 3.488651
## 13 -3.293895 3.323610
## 14 -3.293895 2.878070
## 15 -3.293895 2.946938
## 16 -3.293895 2.785408
## 17 -3.293895 2.719229
## 18 -3.293895 3.724945
## 19 -3.293895 2.986893
## 20 -3.293895 3.095998
## 21 -3.293895 3.573158
## 22 -3.293895 2.627459
## 23 -3.293895 2.421532
## 24 -3.293895 3.433305
## 25 -3.293895 3.072523
## 26 -3.293895 2.931712
## 27 -3.293895 3.011902
## 28 -3.293895 3.186349
## 29 -3.293895 3.080520
## 30 -3.293895 2.179237
##
## attr(,"class")
## [1] "coef.mer"
M3: V1 ~ V2 + (1 + V2|V3)
All parameters from M2 are estimated while allowing correlation between the intercept deviations and V2 effect deviations within levels of V3. Thus, in M3, an additional parameter is estimated: P5: The correlation between intercept deviations and V2 deviations across levels of V3 Usually model pairs like M2 and M3 are computed then compared to evaluate the evidence for correlations between fixed effects (including the global intercept).
m3=lmer(glucose ~ AAA + (1 + AAA|athlete), data=D)
## boundary (singular) fit: see help('isSingular')
summary(m3)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: glucose ~ AAA + (1 + AAA | athlete)
## Data: D
##
## REML criterion at convergence: 3140.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.6693 -0.4137 -0.1135 0.3026 2.0917
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## athlete (Intercept) 0.0000 0.0000
## AAA 0.2051 0.4528 NaN
## Residual 44.9159 6.7019
## Number of obs: 450, groups: athlete, 30
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) -3.2939 3.1745 419.1847 -1.038 0.3
## AAA 3.0254 0.1041 33.5662 29.073 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## AAA -0.604
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
coefficients(m3) # 30 intecepts and 30 different slopes
## $athlete
## (Intercept) AAA
## 1 -3.293893 3.082104
## 2 -3.293893 3.878285
## 3 -3.293893 2.385909
## 4 -3.293893 2.822765
## 5 -3.293893 4.240352
## 6 -3.293893 2.796344
## 7 -3.293893 2.790783
## 8 -3.293893 2.846261
## 9 -3.293893 2.688807
## 10 -3.293893 2.537121
## 11 -3.293893 3.226337
## 12 -3.293893 3.488652
## 13 -3.293893 3.323610
## 14 -3.293893 2.878070
## 15 -3.293893 2.946938
## 16 -3.293893 2.785408
## 17 -3.293893 2.719229
## 18 -3.293893 3.724945
## 19 -3.293893 2.986892
## 20 -3.293893 3.095998
## 21 -3.293893 3.573158
## 22 -3.293893 2.627459
## 23 -3.293893 2.421532
## 24 -3.293893 3.433305
## 25 -3.293893 3.072523
## 26 -3.293893 2.931712
## 27 -3.293893 3.011902
## 28 -3.293893 3.186349
## 29 -3.293893 3.080520
## 30 -3.293893 2.179237
##
## attr(,"class")
## [1] "coef.mer"
M4: V1 ~ V2 * V4 + (1 + V2 * V4|V3)
This model estimates: P1: A global intercept P2: A single global estimate for the effect of V2 P3: A single global estimate for the effect of V4 P4: A single global estimate for the interaction between V2 and V4 P5: Deviations of the intercept from P1 in each level of V3 P6: Deviations of the V2 effect from P2 in each level of V3 P7: Deviations of the V4 effect from P3 in each level of V3 P8: Deviations of the V2-by-V4 interaction from P4 in each level of V3 P9 Correlation between P5 and P6 across levels of V3 P10 Correlation between P5 and P7 across levels of V3 P11 Correlation between P5 and P8 across levels of V3 P12 Correlation between P6 and P7 across levels of V3 P13 Correlation between P6 and P8 across levels of V3 P14 Correlation between P7 and P8 across levels of V3
m4=lmer(glucose ~ AAA * race + (1 + AAA * race|athlete), data=D)
## boundary (singular) fit: see help('isSingular')
summary(m4)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: glucose ~ AAA * race + (1 + AAA * race | athlete)
## Data: D
##
## REML criterion at convergence: 3195
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.86948 -0.41399 -0.07131 0.36357 2.27924
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## athlete (Intercept) 31.23248 5.5886
## AAA 0.22329 0.4725 0.15
## race 44.54034 6.6739 -0.01 0.10
## AAA:race 0.01745 0.1321 0.01 -0.10 -1.00
## Residual 41.52999 6.4444
## Number of obs: 450, groups: athlete, 30
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 2.04615 6.58514 150.91763 0.311 0.756
## AAA 2.91422 0.15497 128.99373 18.805 <2e-16 ***
## race -0.87312 1.43362 306.80517 -0.609 0.543
## AAA:race 0.01805 0.02840 245.71275 0.635 0.526
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) AAA race
## AAA -0.803
## race -0.452 0.422
## AAA:race 0.449 -0.425 -0.999
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
# 30 intercepts (30 inter'pts from athletes; 30 from race|athlete);
# 30 slopes for AAA; 30 for race
# And the interaction slopes of the AAA*race interaction.
coefficients(m4)
## $athlete
## (Intercept) AAA race AAA:race
## 1 1.9767682 2.973291 -0.80056675 0.0165666867
## 2 5.5187591 3.709103 -4.72173719 0.0935066171
## 3 -0.1562702 2.307275 -2.64674741 0.0536808021
## 4 0.9132622 2.725112 0.01885581 0.0005705351
## 5 6.1547881 4.060662 -0.52124891 0.0100822873
## 6 0.8796833 2.720062 -1.99049771 0.0403409923
## 7 1.4982774 2.699735 2.22757629 -0.0431317235
## 8 1.0725843 2.748798 -2.01663693 0.0408314074
## 9 0.9142552 2.583552 -4.19176384 0.0840139344
## 10 0.3014324 2.454717 -0.80722907 0.0171485645
## 11 2.8246422 3.106519 1.61844703 -0.0314301160
## 12 4.2185349 3.339082 -1.26262169 0.0253727875
## 13 3.4190398 3.183039 -5.64081017 0.1121596931
## 14 1.6902255 2.772324 1.45639910 -0.0279320479
## 15 1.7952148 2.842018 -0.63158534 0.0133316832
## 16 1.2079677 2.688746 -0.47821382 0.0104312201
## 17 0.8433720 2.634029 -2.19479846 0.0444541179
## 18 4.6163419 3.568123 0.13711634 -0.0025205625
## 19 2.1586932 2.863903 2.20952995 -0.0429195844
## 20 2.7549928 2.978874 1.54267442 -0.0298264632
## 21 4.8848944 3.410694 -0.58867977 0.0119646354
## 22 0.1561311 2.534604 -2.60443689 0.0526544125
## 23 -0.5623365 2.362343 -1.71135899 0.0351327691
## 24 3.4984027 3.296454 -0.28373071 0.0060485778
## 25 2.7633485 2.943849 0.28424764 -0.0048937747
## 26 1.8186279 2.824920 1.40944140 -0.0270475745
## 27 1.8057243 2.907187 -3.63506360 0.0727188089
## 28 1.8521616 3.085593 0.70271668 -0.0132717180
## 29 1.7451423 2.978966 -1.25556399 0.0255711753
## 30 -1.1800810 2.123068 0.18278218 -0.0021482613
##
## attr(,"class")
## [1] "coef.mer"
References:
Hierarchical linear models and lmer.
NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.