[What follows was initially posted in CrossValidated].
I will use the following toy tabulated data:
Antacid <- matrix(c(64, 178 - 64, 92, 190 - 92), nrow = 2)
dimnames(Antacid) = list(Symptoms = c("Heartburn", "Normal"),
Medication = c("Drug A", "Drug B"))
addmargins(Antacid)
## Medication
## Symptoms Drug A Drug B Sum
## Heartburn 64 92 156
## Normal 114 98 212
## Sum 178 190 368
So we have 368
patients: 178
on
Drug A
, and 190
on Drug B
and we
try to see if there are differences in the proportion of heartburn
symptoms between drug A and B, i.e. \(p1 =
64/178\) vs \(p2 = 92/190\).
1. FISHER EXACT TEST: There is a discussion on Wikipedia about “Controversies”. Based on the hypergeometric distribution, it is probably most adequate when the expected values in any of the cells of a contingency table are below 5 - 10. The story of the RA Fisher and the tea lady is great, and can be reproduced in [R] by simply grabbing the code here. [R] seems to tolerate without a pause the large numbers in our data (no problem with factorials):
Antacid <- matrix(c(64, 178 - 64, 92, 190 - 92), nrow = 2)
fisher.test(Antacid, alternative = "two.sided")
Fisher's Exact Test for Count Data
data: Antacid
p-value = 0.02011
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.3850709 0.9277156
sample estimates:
odds ratio
0.5988478
2. CHI-SQUARE TEST OF HOMOGENEITY: For larger samples (> 5 expected frequency count in each cell) the \(\chi^2\) provides an approximation of the significance value. The test is based on calculating the expected frequency counts obtained by cross-multiplying the marginals (assuming normal distribution of the marginals, it makes sense that we end up with a \(\chi^2\) distributed test statistic, since if \(X\sim N(\mu,\sigma^2)\), then \(X^2\sim \chi^2(1))\):
Medication
Symptoms Drug A Drug B
Heartburn 156 * 178 / 368 = 75 156 * 190 / 368 = 81
Normal 212 * 178 / 368 = 103 212 * 190 / 368 = 109
This can be more easily calculated as:
(addmargins(expect <- chisq.test(Antacid)$expected))
## Medication
## Symptoms Drug A Drug B Sum
## Heartburn 75.45652 80.54348 156
## Normal 102.54348 109.45652 212
## Sum 178.00000 190.00000 368
The degrees of freedom will be calculated
as the {number of populations
(Heartburn
sufferers and Normals
, i.e. 2
)
minus 1
} * {number of levels in the
categorical variable
(Drug A
and
Drug B
, i.e. 2
)
minus 1
}. Therefore, in a 2x2
table
we are dealing with 1 d.f
. And crucially, a \(\chi^2\) of \(1\,df\) is exactly a squared
\(N \sim (0,1)\) (proof here),
which explains the sentence “a chi-square test for equality of two
proportions is exactly the same thing as a z-test. The chi-squared
distribution with one degree of freedom is just that of a normal
deviate, squared. You’re basically just repeating the chi-squared test
on a subset of the contingency table” in this post.
The Test Statistic is calculated as:
\(\chi^2=\frac{(64-75)^2}{75} + \frac{(92-81)^2}{81} +\frac{(114-103)^2}{103} + \frac{(98-109)^2}{109} = 5.39\), although this is an approximation excluding decimals. The precise calculation of these values is the sum of the cells in:
(residuals <- chisq.test(Antacid)$residuals^2)
## Medication
## Symptoms Drug A Drug B
## Heartburn 1.739437 1.629578
## Normal 1.279963 1.199124
sum(residuals)
## [1] 5.848102
This is calculated in R
with the function
prop.test()
or chisq.test()
, which should
yield the same result, as indicated here:
prop.test(Antacid, correct = F)
##
## 2-sample test for equality of proportions without continuity
## correction
##
## data: Antacid
## X-squared = 5.8481, df = 1, p-value = 0.01559
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.22976374 -0.02519514
## sample estimates:
## prop 1 prop 2
## 0.4102564 0.5377358
The proportions are calculated as: \(64/156 = 0.4102564\) and \(114/212 = 0.5377358\). The confidence interval makes reference to the difference in proportions: \(0.4102564 - 0.5377358 = -0.1274794.\)
We don’t need to feed a matrix. A vector of “successes” (in this case
heartburn: x <- c(64, 114)
) with the total number of
cases (n <- c(156, 212)
) will result in the same
output:
prop.test(x = c(64, 114), n = c(156, 212), correct = F)
##
## 2-sample test for equality of proportions without continuity
## correction
##
## data: c(64, 114) out of c(156, 212)
## X-squared = 5.8481, df = 1, p-value = 0.01559
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.22976374 -0.02519514
## sample estimates:
## prop 1 prop 2
## 0.4102564 0.5377358
or..
chisq.test(Antacid, correct = F)
##
## Pearson's Chi-squared test
##
## data: Antacid
## X-squared = 5.8481, df = 1, p-value = 0.01559
3. G-TEST: The Pearson’s
chi-test statistic is the second order Taylor expansion around 1 of the
G test; hence they tend to converge. In R
:
library(DescTools)
GTest(Antacid, correct = 'none')
Log likelihood ratio (G-test) test of
independence without correction
data: Antacid
G = 5.8703, X-squared df = 1, p-value = 0.0154
The normal distribution is a good approximation for a binomial when \(np>5\) and \(n(1-p)>5\). When the occurrences of successes are small in comparison with the total amount of observations, it is the actual number of expected observations that will determine if a normal approximation of a Poisson process can be considered (\(\lambda \geq 5\)).
Although this r-bloggers
post is old, I haven’t found in CV an R
function for
it. This is also reflected in this
R-Bloggers post. This may be due to the fact explained above re:
\(\chi^2_{(df=1)}\sim \,
N_{(0,1)}^2\).
The Test Statistic (TS) is:
\[Z=\frac{\frac{x_1}{n_1}-\frac{x_2}{n_s}}{\sqrt{\widehat{p}(1-\widehat{p})(\frac{1}{n_1}+\frac{1}{n_2})}}\]
with \(\displaystyle p =
\frac{x_1\,+\,x_2}{n_1\,+\,n_2}\), where \(x_1\) and \(x_2\) are the number of “successes” (in our
case, sadly, heartburn), over the number of subjects in that each one of
the levels of the categorical variable (Drug A
and
Drug B
), i.e. \(n_1\) and
\(n_2\).
For a double-tailed test the \(p\)-value will be calcuated as the \(p(|Z|\geq TS)\), which in [R] corresponds
to 2 * pnorm(ts,lower.tail = F)
with
ts = test statistic
.
In the linked page there is an ad hoc formula with a toy example:
The owner of a betting company wants to verify whether a customer is cheating or not. To do this want to compare the number of successes of one player with the number of successes of one of his employees, of which he is certain that he is not cheating. In a month’s time, the player performs \(n_1 =74\) bets and wins \(x_1 = 30;\) the player in the same period of time making \(n_2 = 103\) bets, wins \(x_2 v= 65.\) Your client is a cheat or not?
I have been toying with a spin-off with a lot of loose ends. It
defaults to a two-tailed alpha value of 0.05
, but can be
changed, as much as it can be turned into a one tailed
t = 1
:
zprop = function(x1, x2, n1, n2, alpha=0.05, t = 2){
nume = (x1/n1) - (x2/n2)
p = (x1 + x2) / (n1 + n2)
deno = sqrt(p * (1 - p) * (1/n1 + 1/n2))
z = nume / deno
print(c("Z value:", abs(round(z,4))))
print(c("Cut-off quantile:",
abs(round(qnorm(alpha/t),2))))
print(c("pvalue:", pnorm(-abs(z))))
}
In our of antacid example: Of the patients on Drug A, i.e. \(n_1= 178,\) \(x_1=64\) experienced heartburn (“success”), whereas of the \(n_2=190\) patients on Drug B, \(x_2=92,\) had heartburn:
zprop(64, 92, 178, 190)
## [1] "Z value:" "2.4183"
## [1] "Cut-off quantile:" "1.96"
## [1] "pvalue:" "0.00779692912707934"
Using the example above from the r-loggers post:
zprop(30,65,74,103)
## [1] "Z value:" "2.9697"
## [1] "Cut-off quantile:" "1.96"
## [1] "pvalue:" "0.00149047672261619"
the output is identical, although I have eliminated the possibility
of negative Z values with abs(round(z,4))
because the order
in the difference is not providing information that is not already
apparent. Instead, the z.prop()
function in the linked
post:
z.prop = function(x1,x2,n1,n2){
numerator = (x1/n1) - (x2/n2)
p.common = (x1+x2) / (n1+n2)
denominator = sqrt(p.common * (1-p.common) * (1/n1 + 1/n2))
z.prop.ris = numerator / denominator
return(z.prop.ris)
}
allows for positive and negative Z-test values.
An alternate test statistic is the Wald test:
test statistic = \(\Large \frac{\hat p_1 - \hat p_2}{\sqrt{\frac{\hat p_1(1-\hat p_1)}{n_1}+\frac{\hat p_2(1-(\hat p_2))}{n_2}}}\)
which is useful to create confidence intervals for the difference:
\(\large \hat p_1 - \hat p_2 \pm Z_{1-\alpha/2} \sqrt{\frac{\hat p_1(1-\hat p_1)}{n_1}+\frac{\hat p_2(1-(\hat p_2))}{n_2}}\)
Original post here
NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.