[What follows was initially a post on CrossValidated.]

I’ve looked into it using [R], and I am a bit surprised to see no packaged formula readily accessible for a test that is ubiquitous in the medical sciences. So it takes some minimal tweaking. First off, the link in my comment to the OP is excellent, providing a makeshift formula; however, the following is an example using well-known formulas in [R]:

1. LARGER SAMPLES (> 5 expected counts in each cell):

I’ll work with a toy example that I made up for a different post in CV, summarized into a contingency table comparing how many patients suffered heartburn after being treated with two kinds of antacids. For your question, I have extended the data to a third antacid as follows:

Antacid <- matrix(c(64, 178 - 64, 92, 190 - 92, 52, 188 - 52), nrow = 2, byrow = T)
dimnames(Antacid) = list(Symptoms = c("Heartburn", "Normal"),
                        Medication = c("Drug A", "Drug B", "Drug C"))

Symptoms       Drug A   Drug B   Drug C
  Heartburn     64       92       52
  Normal       114       98      136

First off, we can run an omnibus test (Pearson’s \(\chi^2\) “goodness-of-fit”) on the data by simply calling prop.test, but there is just one minor problem: our data is in the form of a \(2\) x \(3\) matrix. Yet, the essential input into the function is a a two-dimensional table (or matrix) with 2 columns, giving the counts of successes and failures, i.e. an \(n\) x \(2\) matrix. Luckily the fix is easy: do a transpose of the matrix as follows:

Medication    Heartburn    Normal
    Drug A        64        114
    Drug B        92         98
    Drug C        52        136

Now we are ready:

    3-sample test for equality of proportions without continuity correction

data:  t(Antacid)
X-squared = 17.6325, df = 2, p-value = 0.0001483
alternative hypothesis: two.sided
sample estimates:
   prop 1    prop 2    prop 3 
0.3595506 0.4842105 0.2765957 

Actually the alternative function chisq.test() doesn’t seem te require the transpose matrix. Let’s check it out:

  chisq.test(Antacid, correct=F)
  Pearson's Chi-squared test
data:  Antacid
X-squared = 17.6325, df = 2, p-value = 0.0001483

We can even feed the probabilities expected for each cell to [R] instead of the expected counts like this:

Benford’s law states that the leading digit \(d = 1,2,3,4,5,6,7,8,9\) occurs with probability \(\log10(d + 1) − \log 10(d)\). The following table was obtained for the leading digits from addresses randomly sampled from the phone book:

Digits <- as.matrix(c(275, 183, 133, 111, 76, 66, 66, 44, 46))
rownames(Digits) <- c(1:9)
colnames(Digits) <- c("Counts")
##          1   2   3   4  5  6  7  8  9
## Counts 275 183 133 111 76 66 66 44 46

Do the digits appear to follow Benford’s law? What is the relevant P value?

Expected_Probabilities <- as.matrix(round(log10((1:9) + 1) - log10(1:9),3))
rownames(Expected_Probabilities) <- c(1:9)
colnames(Expected_Probabilities) <- c("Probabilities")
##                   1     2     3     4     5     6     7     8     9
## Probabilities 0.301 0.176 0.125 0.097 0.079 0.067 0.058 0.051 0.046
p <- log10((1:9) + 1) ‐ log10(1:9)
chisq.test(Digits, p = p, correct = FALSE)
##  Chi-squared test for given probabilities
## data:  Digits
## X-squared = 7.339, df = 8, p-value = 0.5005

Therefore, with a p value of \(50\%\) we can’t rule out that the digits follow Benford’s law.

We reject when the test statistic is too large, and being that \(\chi^2\) on a \(2\times2\) contingency table (i.e. with 1 df) is the same as a z test squared, we will reject at around qnorm(0.975)^2 = 3.841459. The alternative is always two sided because of the squared makes it non-directinal (we simply test if the proportions are different), and when setting up the alpha risk we don’t divide by two.

So we know that we are not dealing with drugs of equal efficiency to treat heartburn, but we want to know more specifically what the pairwise comparisons have to say. Again, we have to stick with our transposed matrix t():

pairwise.prop.test(t(Antacid), p.adjust.method ="bonferroni")

Notice that to keep the probability of a type I error under check, I selected the Bonferroni method to adjust the p-values. The result is:

    Pairwise comparisons using Pairwise comparison of proportions 

data:  t(Antacid) 

       Drug A  Drug B 
Drug B 0.06221 -      
Drug C 0.33388 0.00015

P value adjustment method: bonferroni 

Interestingly, we can get this info all at once graphically simply with the following commands:

mosaic(Antacid, shade=TRUE, legend=TRUE)

enter image description here

There you have it: when you look at the heartburn row, the square in pink, corresponding to Drug C is much smaller than Drug B (in blue), and the significance of the residuals (i.e. differences between expected and observed values for each entry, squared) are plotted to the right in terms of color hue: the darker the color (blue or pink) the larger the residuals. The residuals (distance from the expected value) can be exactly calculated to fully understand the plot by first summoning the table with marginal counts:

Antacid <- rbind(Antacid, margin.table(Antacid,2))
Antacid <- cbind(Antacid, margin.table(Antacid,1))
dimnames(Antacid) = list(Symptoms = c("Heartburn", "Normal","Totals"),
               Medication = c("Drug A", "Drug B", "Drug C", "Totals"))

Symptoms    Drug A Drug B Drug C Totals
  Heartburn     64     92     52    208
  Normal       114     98    136    348
  Totals       178    190    188    556

Now we can see that the departure for Drug C number of heartburn sufferers from the number expected is in the negative territory:

Exp_burn <- Antacid[1,4] * Antacid[3,3] / Antacid[3,4]
(Antacid[1,3] - Exp_burn)/Exp_burn
[1] -0.2606383

Whereas for Drug B is a excess of sufferers to those predicted:

Exp_burn <- Antacid[1,4] * Antacid[3,2] / Antacid[3,4]
[1] 0.294332

And these results explain the color coding in terms of the hue of pink and blue.

The column to the right also includes the exact same p-value we just got for the pairwise \(\chi^2\) of Drug B versus Drug C with Bonferroni adjustment.


We use the Fisher exact test, which is based on the hypergeometric distribution, and it is probably most adequate when the expected values in any of the cells of a contingency table are below 5 - 10. Although originally conceived for \(2\) x \(2\) contingency tables, only the quickly mounting number of permutation tables in the extension of the Fisher test to \(m\) x \(n\) tables (so-called Freeman Halton test) gets in the way of its direct applications beyond the more rudimentary contingency table. This is discussed in CV, and elaborated further in this Wolfram post. The original article on the Freeman-Halton test can be found in Biometrika (1951) 38 (1-2): 141-149. doi: 10.1093/biomet/38.1-2.141.

Let’s throw out subjects from our Antacid table, and reduce it in size so that the counts are low in each cell:

Antacid <- matrix(c(7, 11 - 7, 1, 15 - 4, 6, 9-6), nrow = 2)
dimnames(Antacid) = list(Symptoms = c("Heartburn", "Normal"),
                    Medication = c("Drug A", "Drug B", "Drug C"))

Symptoms       Drug A  Drug B  Drug C
  Heartburn      7      1      6
  Normal         4     11      3

addmargins(Antacid) # Thanks for the tip @gung

Symptoms       Drug A   Drug B   Drug C    Sum
  Heartburn      7         1      6        14
  Normal         4        11      3        18
  Sum           11        12      9        32

Clearly we would opt to take Drug B if given a choice. Let’s look at the numbers, first globally:

        Fisher's Exact Test for Count Data

data:  Antacid
p-value = 0.008444
alternative hypothesis: two.sided

At this point we would reject the idea of these three drugs being equal.

Notice that as the table gets bigger computational issues may arise, explaining the command:

fisher.test(Antacid, simulate.p.value=TRUE)
Fisher's Exact Test for Count Data with simulated p-value (based on 2000 replicates)
    data:  Antacid 
    p-value = 0.009495 
    alternative hypothesis: two.sided

In this case the p-value is calculated through a Monte Carlo simulation.

Parenthetically, this is the code for a Monte Carlo calculation of the Fisher exact test p-value based on a permutation test:


# The actual data:

Treatment <- c("treat", "treat", "treat", "treat", "treat", "control", "control", "control", "control", "control")
Tumor <- c("tumor", "tumor", "tumor", "tumor", "healthy", "tumor", "tumor", "healthy", "healthy", "healthy")
dat <- table(Treatment, Tumor)
##          Tumor
## Treatment healthy tumor
##   control       3     2
##   treat         1     4
# The Monte Carlo:

Tumor_Treated <- 0
 for (i in 1: 1e5){
  samp <- sample(Treatment, replace = FALSE)
  tab <- table(samp, Tumor)
  Tumor_Treated[i] <- tab[2,2]

(p_value <- length(Tumor_Treated[Tumor_Treated>=4])/length(Tumor_Treated))
## [1] 0.26042
# The builtin function:

fisher.test(dat, alternative = "greater")
##  Fisher's Exact Test for Count Data
## data:  dat
## p-value = 0.2619
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
##  0.3152217       Inf
## sample estimates:
## odds ratio 
##   4.918388

Now for pairwise comparisons we can proceed as follows (notice how we need to transpose the matrix as explained above):

pairwise.fisher.test(t(Antacid), p.adjust.method = "bonferroni")

Pairwise comparisons using Pairwise comparison of proportions (Fisher) 

data:  t(Antacid) 

       Drug A Drug B
Drug B 0.028  -     
Drug C 1.000  0.047 

P value adjustment method: bonferroni 

So it is reasonable to conclude (assuming a risk \(\alpha\) of 5%) that Drug B is different from both Drug A and Drug C.

Original post here.

Home Page