### COMPARING PROPORTIONS BETWEEN SAMPLES:

[What follows was initially posted in CrossValidated].

I will use the following toy tabulated data:

Antacid <- matrix(c(64, 178 - 64, 92, 190 - 92), nrow = 2)
dimnames(Antacid) = list(Symptoms = c("Heartburn", "Normal"),
Medication = c("Drug A", "Drug B"))
addmargins(Antacid)
##            Medication
## Symptoms    Drug A Drug B Sum
##   Heartburn     64     92 156
##   Normal       114     98 212
##   Sum          178    190 368

So we have 368patients: 178 on Drug A, and 190 on Drug B and we try to see if there are differences in the proportion of heartburn symptoms between drug A and B, i.e. $$p1 = 64/178$$ vs $$p2 = 92/190$$.

1. FISHER EXACT TEST: There is a discussion on Wikipedia about “Controversies”. Based on the hypergeometric distribution, it is probably most adequate when the expected values in any of the cells of a contingency table are below 5 - 10. The story of the RA Fisher and the tea lady is great, and can be reproduced in [R] by simply grabbing the code here. [R] seems to tolerate without a pause the large numbers in our data (no problem with factorials):

 Antacid <- matrix(c(64, 178 - 64, 92, 190 - 92), nrow = 2)
fisher.test(Antacid, alternative = "two.sided")
Fisher's Exact Test for Count Data
data:  Antacid
p-value = 0.02011
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.3850709 0.9277156
sample estimates:
odds ratio
0.5988478 

2. CHI-SQUARE TEST OF HOMOGENEITY: For larger samples (> 5 expected frequency count in each cell) the $$\chi^2$$ provides an approximation of the significance value. The test is based on calculating the expected frequency counts obtained by cross-multiplying the marginals (assuming normal distribution of the marginals, it makes sense that we end up with a $$\chi^2$$ distributed test statistic, since if $$X\sim N(\mu,\sigma^2)$$, then $$X^2\sim \chi^2(1))$$:

                    Medication
Symptoms       Drug A                   Drug B
Heartburn     156 * 178 / 368 = 75      156 * 190 / 368 = 81
Normal        212 * 178 / 368 = 103     212 * 190 / 368 = 109

This can be more easily calculated as:

(addmargins(expect <- chisq.test(Antacid)$expected)) ## Medication ## Symptoms Drug A Drug B Sum ## Heartburn 75.45652 80.54348 156 ## Normal 102.54348 109.45652 212 ## Sum 178.00000 190.00000 368 The degrees of freedom will be calculated as the {number of populations (Heartburn sufferers and Normals, i.e. 2) minus 1 } * {number of levels in the categorical variable (Drug A and Drug B, i.e. 2) minus 1}. Therefore, in a 2x2 table we are dealing with 1 d.f. And crucially, a $$\chi^2$$ of $$1\,df$$ is exactly a squared $$N \sim (0,1)$$ (proof here), which explains the sentence “a chi-square test for equality of two proportions is exactly the same thing as a z-test. The chi-squared distribution with one degree of freedom is just that of a normal deviate, squared. You’re basically just repeating the chi-squared test on a subset of the contingency table” in this post. The Test Statistic is calculated as: $$\chi^2=\frac{(64-75)^2}{75} + \frac{(92-81)^2}{81} +\frac{(114-103)^2}{103} + \frac{(98-109)^2}{109} = 5.39$$, although this is an approximation excluding decimals. The precise calculation of these values is the sum of the cells in: (residuals <- chisq.test(Antacid)$residuals^2)
##            Medication
## Symptoms      Drug A   Drug B
##   Heartburn 1.739437 1.629578
##   Normal    1.279963 1.199124
sum(residuals)
## [1] 5.848102

This is calculated in R with the function prop.test() or chisq.test(), which should yield the same result, as indicated here:

    prop.test(Antacid, correct = F)
##
##  2-sample test for equality of proportions without continuity
##  correction
##
## data:  Antacid
## X-squared = 5.8481, df = 1, p-value = 0.01559
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.22976374 -0.02519514
## sample estimates:
##    prop 1    prop 2
## 0.4102564 0.5377358

The proportions are calculated as: $$64/156 = 0.4102564$$ and $$114/212 = 0.5377358$$. The confidence interval makes reference to the difference in proportions: $$0.4102564 - 0.5377358 = -0.1274794.$$

We don’t need to feed a matrix. A vector of “successes” (in this case heartburn: x <- c(64, 114)) with the total number of cases (n <- c(156, 212)) will result in the same output:

prop.test(x = c(64, 114), n = c(156, 212), correct = F)
##
##  2-sample test for equality of proportions without continuity
##  correction
##
## data:  c(64, 114) out of c(156, 212)
## X-squared = 5.8481, df = 1, p-value = 0.01559
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.22976374 -0.02519514
## sample estimates:
##    prop 1    prop 2
## 0.4102564 0.5377358

or..

    chisq.test(Antacid, correct = F)
##
##  Pearson's Chi-squared test
##
## data:  Antacid
## X-squared = 5.8481, df = 1, p-value = 0.01559

3. G-TEST: The Pearson’s chi-test statistic is the second order Taylor expansion around 1 of the G test; hence they tend to converge. In R:

library(DescTools)
GTest(Antacid, correct = 'none')
Log likelihood ratio (G-test) test of
independence without correction

data:  Antacid
G = 5.8703, X-squared df = 1, p-value = 0.0154

The normal distribution is a good approximation for a binomial when $$np>5$$ and $$n(1-p)>5$$. When the occurrences of successes are small in comparison with the total amount of observations, it is the actual number of expected observations that will determine if a normal approximation of a Poisson process can be considered ($$\lambda \geq 5$$).

Although this r-bloggers post is old, I haven’t found in CV an R function for it. This is also reflected in this R-Bloggers post. This may be due to the fact explained above re: $$\chi^2_{(df=1)}\sim \, N_{(0,1)}^2$$.

The Test Statistic (TS) is:

$Z=\frac{\frac{x_1}{n_1}-\frac{x_2}{n_s}}{\sqrt{\widehat{p}(1-\widehat{p})(\frac{1}{n_1}+\frac{1}{n_2})}}$

with $$\displaystyle p = \frac{x_1\,+\,x_2}{n_1\,+\,n_2}$$, where $$x_1$$ and $$x_2$$ are the number of “successes” (in our case, sadly, heartburn), over the number of subjects in that each one of the levels of the categorical variable (Drug A and Drug B), i.e. $$n_1$$ and $$n_2$$.

For a double-tailed test the $$p$$-value will be calcuated as the $$p(|Z|\geq TS)$$, which in [R] corresponds to 2 * pnorm(ts,lower.tail = F) with ts = test statistic.

In the linked page there is an ad hoc formula with a toy example:

The owner of a betting company wants to verify whether a customer is cheating or not. To do this want to compare the number of successes of one player with the number of successes of one of his employees, of which he is certain that he is not cheating. In a month’s time, the player performs $$n_1 =74$$ bets and wins $$x_1 = 30;$$ the player in the same period of time making $$n_2 = 103$$ bets, wins $$x_2 v= 65.$$ Your client is a cheat or not?

I have been toying with a spin-off with a lot of loose ends. It defaults to a two-tailed alpha value of 0.05, but can be changed, as much as it can be turned into a one tailed t = 1:

    zprop = function(x1, x2, n1, n2, alpha=0.05, t = 2){
nume = (x1/n1) - (x2/n2)
p = (x1 + x2) / (n1 + n2)
deno = sqrt(p * (1 - p) * (1/n1 + 1/n2))
z = nume / deno
print(c("Z value:", abs(round(z,4))))
print(c("Cut-off quantile:",
abs(round(qnorm(alpha/t),2))))
print(c("pvalue:", pnorm(-abs(z))))
}

In our of antacid example: Of the patients on Drug A, i.e. $$n_1= 178,$$ $$x_1=64$$ experienced heartburn (“success”), whereas of the $$n_2=190$$ patients on Drug B, $$x_2=92,$$ had heartburn:

zprop(64, 92, 178, 190)
## [1] "Z value:" "2.4183"
## [1] "Cut-off quantile:" "1.96"
## [1] "pvalue:"             "0.00779692912707934"

Using the example above from the r-loggers post:

zprop(30,65,74,103) 
## [1] "Z value:" "2.9697"
## [1] "Cut-off quantile:" "1.96"
## [1] "pvalue:"             "0.00149047672261619"

the output is identical, although I have eliminated the possibility of negative Z values with abs(round(z,4)) because the order in the difference is not providing information that is not already apparent. Instead, the z.prop() function in the linked post:

z.prop = function(x1,x2,n1,n2){
numerator = (x1/n1) - (x2/n2)
p.common = (x1+x2) / (n1+n2)
denominator = sqrt(p.common * (1-p.common) * (1/n1 + 1/n2))
z.prop.ris = numerator / denominator
return(z.prop.ris)
}

allows for positive and negative Z-test values.

An alternate test statistic is the Wald test:

test statistic = $$\Large \frac{\hat p_1 - \hat p_2}{\sqrt{\frac{\hat p_1(1-\hat p_1)}{n_1}+\frac{\hat p_2(1-(\hat p_2))}{n_2}}}$$

which is useful to create confidence intervals for the difference:

$$\large \hat p_1 - \hat p_2 \pm Z_{1-\alpha/2} \sqrt{\frac{\hat p_1(1-\hat p_1)}{n_1}+\frac{\hat p_2(1-(\hat p_2))}{n_2}}$$

Original post here