### COMPARING PROPORTIONS BETWEEN SAMPLES:

[What follows was initially posted in CrossValidated].

I will use the following toy tabulated data:

Antacid <- matrix(c(64, 178 - 64, 92, 190 - 92), nrow = 2)
dimnames(Antacid) = list(Symptoms = c("Heartburn", "Normal"),
Medication = c("Drug A", "Drug B"))
addmargins(Antacid)
##            Medication
## Symptoms    Drug A Drug B Sum
##   Heartburn     64     92 156
##   Normal       114     98 212
##   Sum          178    190 368

So we have 368patients: 178 on Drug A, and 190 on Drug B and we try to see if there are differences in the proportion of heartburn symptoms between drug A and B, i.e. $$p1 = 64/178$$ vs $$p2 = 92/190$$.

1. FISHER EXACT TEST: There is a discussion on Wikipedia about “Controversies”. Based on the hypergeometric distribution, it is probably most adequate when the expected values in any of the cells of a contingency table are below 5 - 10. The story of the RA Fisher and the tea lady is great, and can be reproduced in [R] by simply grabbing the code here. [R] seems to tolerate without a pause the large numbers in our data (no problem with factorials):

 Antacid <- matrix(c(64, 178 - 64, 92, 190 - 92), nrow = 2)
fisher.test(Antacid, alternative = "two.sided")
Fisher's Exact Test for Count Data
data:  Antacid
p-value = 0.02011
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.3850709 0.9277156
sample estimates:
odds ratio
0.5988478 

2. CHI-SQUARE TEST OF HOMOGENEITY: For larger samples (> 5 expected frequency count in each cell) the $$\chi^2$$ provides an approximation of the significance value. The test is based on calculating the expected frequency counts obtained by cross-multiplying the marginals (assuming normal distribution of the marginals, it makes sense that we end up with a $$\chi^2$$ distributed test statistic, since if $$X\sim N(\mu,\sigma^2)$$, then $$X^2\sim \chi^2(1))$$:

                    Medication
Symptoms       Drug A                   Drug B
Heartburn     156 * 178 / 368 = 75      156 * 190 / 368 = 81
Normal        212 * 178 / 368 = 103     212 * 190 / 368 = 109

This can be more easily calculated as:

(addmargins(expect <- chisq.test(Antacid)$expected)) ## Medication ## Symptoms Drug A Drug B Sum ## Heartburn 75.45652 80.54348 156 ## Normal 102.54348 109.45652 212 ## Sum 178.00000 190.00000 368 The degrees of freedom will be calculated as the {number of populations (Heartburn sufferers and Normals, i.e. 2) minus 1 } * {number of levels in the categorical variable (Drug A and Drug B, i.e. 2) minus 1}. Therefore, in a 2x2 table we are dealing with 1 d.f. And crucially, a $$\chi^2$$ of $$1\,df$$ is exactly a squared $$N \sim (0,1)$$ (proof here), which explains the sentence “a chi-square test for equality of two proportions is exactly the same thing as a z-test. The chi-squared distribution with one degree of freedom is just that of a normal deviate, squared. You’re basically just repeating the chi-squared test on a subset of the contingency table” in this post. The Test Statistic is calculated as: $$\chi^2=\frac{(64-75)^2}{75} + \frac{(92-81)^2}{81} +\frac{(114-103)^2}{103} + \frac{(98-109)^2}{109} = 5.39$$, although this is an approximation excluding decimals. The precise calculation of these values is the sum of the cells in: (residuals <- chisq.test(Antacid)$residuals^2)
##            Medication
## Symptoms      Drug A   Drug B
##   Heartburn 1.739437 1.629578
##   Normal    1.279963 1.199124
sum(residuals)
##  5.848102

This is calculated in R with the function prop.test() or chisq.test(), which should yield the same result, as indicated here:

    prop.test(Antacid, correct = F)
##
##  2-sample test for equality of proportions without continuity
##  correction
##
## data:  Antacid
## X-squared = 5.8481, df = 1, p-value = 0.01559
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.22976374 -0.02519514
## sample estimates:
##    prop 1    prop 2
## 0.4102564 0.5377358

The proportions are calculated as: $$64/156 = 0.4102564$$ and $$114/212 = 0.5377358$$. The confidence interval makes reference to the difference in proportions: $$0.4102564 - 0.5377358 = -0.1274794.$$

We don’t need to feed a matrix. A vector of “successes” (in this case heartburn: x <- c(64, 114)) with the total number of cases (n <- c(156, 212)) will result in the same output:

prop.test(x = c(64, 114), n = c(156, 212), correct = F)
##
##  2-sample test for equality of proportions without continuity
##  correction
##
## data:  c(64, 114) out of c(156, 212)
## X-squared = 5.8481, df = 1, p-value = 0.01559
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.22976374 -0.02519514
## sample estimates:
##    prop 1    prop 2
## 0.4102564 0.5377358

or..

    chisq.test(Antacid, correct = F)
##
##  Pearson's Chi-squared test
##
## data:  Antacid
## X-squared = 5.8481, df = 1, p-value = 0.01559

3. G-TEST: The Pearson’s chi-test statistic is the second order Taylor expansion around 1 of the G test; hence they tend to converge. In R:

library(DescTools)
GTest(Antacid, correct = 'none')
Log likelihood ratio (G-test) test of
independence without correction

data:  Antacid
G = 5.8703, X-squared df = 1, p-value = 0.0154

4. Z-TEST OF PROPORTIONS: The normal distribution is a good approximation for a binomial when $$np>5$$ and $$n(1-p)>5$$. When the occurrences of successes are small in comparison with the total amount of observations, it is the actual number of expected observations that will determine if a normal approximation of a poisson process can be considered ($$\lambda \geq 5$$).

Although the post hyperlinked is old, I haven’t found in CV an R function for it. This may be due to the fact explained above re: $$\chi^2_{(df=1)}\sim \, N_{(0,1)}^2$$.

The Test Statistic (TS) is:

$$\displaystyle Z =\frac{\frac{x_1}{n_1}-\frac{x_2}{n_2}}{\sqrt{p\,(1-p)(1/n_1+1/n_2)}}$$ with $$\displaystyle p = \frac{x_1\,+\,x_2}{n_1\,+\,n_2}$$, where $$x_1$$ and $$x_2$$ are the number of “successes” (in our case, sadly, heartburn), over the number of subjects in that each one of the levels of the categorical variable (Drug A and Drug B), i.e. $$n_1$$ and $$n_2$$.

For a double-tailed test the $$p$$-value will be calcuated as the $$p(|Z|\geq TS)$$, which in [R] corresponds to 2 * pnorm(ts,lower.tail = F) with ts = test statistic.

In the linked page there is an ad hoc formula. I have been toying with a spin-off with a lot of loose ends. It defaults to a two-tailed alpha value of 0.05, but can be changed, as much as it can be turned into a one tailed t = 1:

zprop = function(x1, x2, n1, n2, alpha=0.05, t = 2){
nume = (x1/n1) - (x2/n2)
p = (x1 + x2) / (n1 + n2)
deno = sqrt(p * (1 - p) * (1/n1 + 1/n2))
z = nume / deno
print(c("Z value:",abs(round(z,4))))
print(c("Cut-off quantile:",
abs(round(qnorm(alpha/t),2))))
print(c("pvalue:", pnorm(-abs(z))))
}

In our case:

    zprop(64, 92 , 178, 190)
 Z value:          2.4183
 Cut-off quantile: 1.96
 pvalue:           0.0077

Giving the same z value as the function in the R-Bloggers: z.prop(64, 178 , 92, 190)  -5.44273

An alternate test statistic is the Wald test:

test statistic = $$\Large \frac{\hat p_1 - \hat p_2}{\sqrt{\frac{\hat p_1(1-\hat p_1)}{n_1}+\frac{\hat p_2(1-(\hat p_2))}{n_2}}}$$

which is useful to create confidence intervals for the difference:

$$\large \hat p_1 - \hat p_2 \pm Z_{1-\alpha/2} \sqrt{\frac{\hat p_1(1-\hat p_1)}{n_1}+\frac{\hat p_2(1-(\hat p_2))}{n_2}}$$

Original post here