Five groups of rats: MS
received morphine on 3 occasions followed by saline;
MM
received morphine on 4 occasions; SS
received
only saline; and McM
morphine in 4 separate occasions, but last dose was
administer in a different environment. The latency betwen liking their
paws was measured after the last dose.
## MS MM SS SM McM
## 1 3 2 14 29 24
## 2 5 12 6 20 26
## 3 1 13 12 36 40
## 4 8 6 4 21 32
## 5 1 10 19 25 20
## 6 1 7 3 18 33
## 7 4 11 9 26 27
## 8 9 19 21 17 30
We can run an ANOVA after preparing the data in long format:
require("reshape")
dat = melt(d)
colnames(dat)=c("Rat","Sequence","Latency")
summary(aov(Latency ~ Sequence, dat))
## Df Sum Sq Mean Sq F value Pr(>F)
## Sequence 4 3498 874.4 27.32 2.44e-10 ***
## Residuals 35 1120 32.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The calculations involved can be carried out manually as follows:
(SSBetween = sum(nrow(d) * (colMeans(d) - mean(d))^2))
## [1] 3497.6
(SSWithin = sum(scale(d, center = T, scale = F)^2))
## [1] 1120
(SSTotal = sum((d - mean(d))^2))
## [1] 4617.6
(df_btw = ncol(d) - 1)
## [1] 4
(MeanSqBetween = SSBetween / df_btw)
## [1] 874.4
(df_within = length(d) - ncol(d))
## [1] 35
(MeanSqWithin = SSWithin / df_within)
## [1] 32
(F_value = MeanSqBetween / MeanSqWithin)
## [1] 27.325
1 - pf(F_value, ncol(d)-1, length(d)-ncol(d))
## [1] 2.443479e-10
summary(aov(Latency ~ Sequence, dat)) # For comparison.
## Df Sum Sq Mean Sq F value Pr(>F)
## Sequence 4 3498 874.4 27.32 2.44e-10 ***
## Residuals 35 1120 32.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here is the equivalent calculation with lm()
:
summary(lm(Latency ~ Sequence, dat))
A linear combination of the group means is a contrast if:
k∑j=1ajˉXj=a1ˉX1+a2ˉX2+⋯+akˉXk
and ∑kj=1aj=0
Linear contrasts are typically a priori tests. A priori tests are planned even before the data is collected. A priori tests imply a relatively small set of comparisons. If we make all possible pairwise comparisons among several means, it doesn’t make any difference whether they was planned in advance or not. The idea is to minimize the familywize error rate. Looking at the data before deciding which comparisons will be carried out greatly predisposes to type I errors, because all the other comparisons are implicitely made to pick up the largest differences. That is why for some post hoc tests, the error rate is adjusted as though all possible comparisons had been made.
Contrasts can be set up to compare one mean to another. For example in the case of three means:
L=1×ˉX1−1×ˉX2+0×ˉX3=ˉX1−ˉX2
They can also compare the average of 2 means with a third mean:
L=1/2×ˉX1+1/2×ˉX2−1×ˉX3=ˉX1+ˉX22−ˉX3
To see the relation between the sum of squares of contrasts with the initial sum of squares “between” treatments, we can see that the sum of squares of orthogonal contrasts adds up to the SS “between”, defining the sum of squares between contrasts as:
SScontrast=nL2∑a2j
n corresponds to the number of cases.
For contrasts to be orthogonal their dot products have to be zero, and the number of comparisons identical to the df for treatments. There are many sets of orthogonal contrasts, and which ones to run is up to the experimenter.
For the dataset above, with 5 columns, the orthogonal contrasts (dot product of zero) are:
[4;−1;−1;−1;−1][0;3;−1;−1;−1][0;0;2;−1;−1][0;0;0;1;−1]
M = colMeans(d)
L1 = 4 * M[1] - 1 * M[2] - 1 * M[3] - 1 * M[4] - 1 * M[5]
SSc1 = nrow(d) * L1^2 / sum(16, 4 * (-1)^2)
L2 = 3 * M[2] -1 * M[3] -1 * M[4] -1 * M[5]
ssc2 = nrow(d) * L2^2 / (sum(9, 3 * (-1)^2))
L3 = 2 * M[3] -1 * M[4] -1 * M[5]
ssc3 = nrow(d) * L3^2/ sum(4 + 2 * (-1)^2)
L4 = M[4] - M[5]
ssc4 = nrow(d) * L4^2 / 2
sum(SSc1, ssc2, ssc3, ssc4)
## [1] 3497.6
(SSBetween = sum(nrow(d) * (colMeans(d) - mean(d))^2))
## [1] 3497.6
If we take the second contrast, [a1=0;a2=3;a3=−1;a4=−1;a5=−1], and we change it to [a1=0;a2=1;a3=−1/3;a4=−1/3;a5=−1/3], the sum of squares of the contrast will be:
(L_2 = as.numeric(1 * M[2] - 1/3 * M[3] - 1/3 * M[4] - 1/3 * M[5]))
## [1] -11.33333
(ssc_2 = nrow(d) * L_2^2 / ((1)^2 + 3 * (-1/3)^2))
## [1] 770.6667
# compared to:
(L2 = as.numeric(3 * M[2] -1 * M[3] -1 * M[4] -1 * M[5]))
## [1] -34
(ssc2 = nrow(d) * L2^2 / (sum(9, 3 * (-1)^2)))
## [1] 770.6667
identical!
Contrast have 1 degree of freedom, so that the F value of again the second contrast, calculated as:
F=MScontrastMSerror=nL2∑a2jMSerror
will be:
(F = nrow(d) * L2^2 / 4) / MeanSqWithin
## [1] 72.25
compared to t-test of M[2] + M[3]
versus
M[4] + M[5]
A contrast such as: [a1=1;a2=−1;a3=0;a4=0;a5=0], would have an F value:
(F = (nrow(d) * (sum(M[1] - M[2]))^2/2) / MeanSqWithin)
## [1] 4.5
with 1 df, which corresponds to
a significance level of F0.05(1,35)=4.12, or
qf(0.95,df1=1, df2=35)
. Therefore it is statistically
significant.
NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.