\(H_o: \mu_1=\mu_2\) while \(H_1: \mu_1\neq \mu_2\). We can also test \(\mu_1 \geq \mu_2\) and \(\mu_2 \geq \mu_1\).
It’s convenience to assume a normal variance and utilize a test statistic as follows:
\[\text{test statistic} = \large \frac{\bar X - \bar Y}{S_p\,\sqrt{\frac{1}{n_x} + \frac{1}{n_y}}}.\]
\(S_p\) is the pooled standard deviation across both samples. It is the square root of the mean of variances when the sample size is the same: \(S_p=\sqrt{\mathrm{SD}_x^2+\mathrm{SD}_y^2}\). It follows a \(t\) distribution with \(n_x + n_y - 2\) degrees of freedom.
However, if the groups are not equal:
In statistics, pooled variance (also known as combined, composite, or overall variance) is a method for estimating variance of several different populations when the mean of each population may be different, but one may assume that the variance of each population is the same. If the populations are indexed \(i = 1,\ldots,k\), then the pooled variance \(s_p^2\) can be estimated by the weighted average of the sample variances \(s^2_i\):
\[ s_p^2=\frac{\sum_{i=1}^k (n_i - 1)s_i^2}{\sum_{i=1}^k(n_i - 1)} = \frac{(n_1 - 1)s_1^2+(n_2 - 1)s_2^2+\cdots+(n_k - 1)s_k^2}{n_1+n_2+\cdots+n_k - k}\]
where \(n_i\) is the sample size of population \(i\) and \(s^2_i\) is $ _{j=1}^{n_i} (y_j - y_j )^2$ . Use of \((n_i-1)\) weighting factors instead of \(n_i\) comes from Bessel’s correction.
For two samples the \(S_p\) will therefore be:
\[S_p\,=\,\sqrt{\frac{(n_x - 1)\,S_x^2\,+\,(n_y - 1)\,S_y^2}{n_x + n_y -2}}\]
Example:
We want to look into the effects of Dexmethylphenidate on weight in
children with ADHD, and we randomized n1 = 15
children to
treatment, and n2 = 15
to placebo, measuring weight before
the beginning of the treatment and again 2 months later. The treated
group dropped an average of delta1 = -2.4 lbs.
(with a
SD1 = 0.4 lbs.
), while the placebo group did also drop, but
only delta2 = -0.9 lbs.
(with a
SD2 = 0.8 lbs.
). Is the difference significant (two-tailed
test)?
First off we need to calculate the [pooled variance][1]:
\[\small S_p^2=\frac{\sum_{i=1}^k (n_i - 1)s_i^2}{\sum_{i=1}^k(n_i - 1)} = \frac{(n_1 - 1)s_1^2+(n_2 - 1)s_2^2+\cdots+(n_k - 1)s_k^2}{n_1+n_2+\cdots+n_k - k}\]
In our case:
\[Sp^2 =\frac{(n_1-1) \times \mathrm{SD}_1^2 \,+\, (n_2-1) \times \mathrm{SD}_2^2}{n_1 + n_2 -2} = \frac{1}{2} (\mathrm{SD}_1^2+ \mathrm{SD}_2^2).\]
The SE will be calculated as
\[\mathrm{SE} = \sqrt {sp^2} \times \sqrt {\frac{1}{n1}+\frac{1}{n2}}.\]
In [R]:
n1 = 15; n2 = 15
delta1 = -2.4; delta2 = -0.9
sp <- sqrt((sd1^2 + sd2^2)/2)
se <- sp * sqrt(1/n1 + 1/n2)
t_stat <- (delta1 - delta2)/se
2 * pt(t_stat, df = n1 + n2 -2)
The result is clearly significant at a
p = 4.880225e-07
.
If the assumption of common variance is not warranted, the test will be:
\[\text{test statistic} = \large \frac{\bar X - \bar Y}{\sqrt{\frac{\mathrm{SD}_x^2}{n_x}+\frac{\mathrm{SD}_y^2}{n_y}}}.\]
This follows a \(t\) distribution if \(X_i\) and \(Y_i\) are normally distributed.
The degrees of freedom are calculated as:
\[\text{degrees of freedom} =\large \frac{(S_x^2/n_x + S_y^2/n_y)^2}{\frac{(S_x^2/n_x)^2}{n_x-1}+\frac{(S_y^2/n_y)^2}{n_y-1}}.\]
Example:
We want to compare the miles per gallon in cars with four versus eight cylinders from the dataset mtcars:
four_cyl <- mtcars$mpg[mtcars$cyl==4]
eight_cyl <- mtcars$mpg[mtcars$cyl==8]
Fist we may want to look at whether the variance differs between both groups with an \(F\) test (homoskedasticity):
var.test(four_cyl, eight_cyl)
##
## F test to compare two variances
##
## data: four_cyl and eight_cyl
## F = 3.1033, num df = 10, denom df = 13, p-value = 0.05925
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.9549589 11.1197133
## sample estimates:
## ratio of variances
## 3.103299
And we see that we don’t have enough evidence to conclude that the samples have different variances.
We can also run a Lavene test for the same purpose:
library(reshape2)
library(car)
We need to combine data:
sample <- as.data.frame(cbind(four_cyl, eight_cyl))
## Warning in cbind(four_cyl, eight_cyl): number of rows of result is not a
## multiple of vector length (arg 1)
dataset <- melt(sample)
## No id variables; using all as measure variables
leveneTest(value ~ variable, dataset)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 2.8025 0.1061
## 26
Now we can compare the means:
t.test(four_cyl, eight_cyl, var.equal=TRUE, paired=FALSE,
alternative="two.sided")
##
## Two Sample t-test
##
## data: four_cyl and eight_cyl
## t = 8.1024, df = 23, p-value = 3.446e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 8.611261 14.516012
## sample estimates:
## mean of x mean of y
## 26.66364 15.10000
Clearly different means.
MANUAL CALCULATION OF CONFIDENCE INTERVAL:
For samples of less than 30:
\[(\bar X - \bar Y) \pm \mathrm{qt}(0.975, n_x + n_y - 1) \,S_p\, \sqrt{\frac{1}{n_x}+\frac{1}{n_y}}\]
And for larger samples:
\[(\bar X - \bar Y) \pm \mathrm{qnorm}(0.975)\,S_p \,\sqrt{\frac{1}{n_x}+\frac{1}{n_y}}\]
We don’t want two different confidence intervals. We want the CI of the difference.
(n_x <- length(four_cyl)); (n_y <- length(eight_cyl))
## [1] 11
## [1] 14
(Sp <- sqrt((((n_x - 1)*(sd(four_cyl))^2) + ((n_y - 1)*(sd(eight_cyl))^2))/(n_x + n_y - 2)))
## [1] 3.542202
# Compare to:
sd(four_cyl); sd(eight_cyl)
## [1] 4.509828
## [1] 2.560048
# Finally the CI of the difference:
(CI <- (mean(four_cyl)- mean(eight_cyl)) + c(-1, 1) * qt(0.975,(n_x + n_y-2)) * Sp * sqrt(1/n_x + 1/n_y))
## [1] 8.611261 14.516012
This interval clearly does not include a zero diference: the four-cylinder vehicles have a higher mileage than the eight-cylinder cars.
NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.