Categorical Data: Multinomial Exact Test

NOTES ON STATISTICS, PROBABILITY and MATHEMATICS

Exact Multinomial Test (Goodness-of-Fit):

It is the exact counterpart of the chi square goodness-of-fit (GoF) test, and extends the binomial test to multiple categories. So the binomial test would be a form of exact goodness-of-fit testing for one single proportion, while the multinomial test would deal with a categorical variable with multiple values or levels.

To calculate the multinomial probability of a given distribution of counts given certain expected probabilities, the math is straightforward:

Suppose one does an experiment of extracting \(n\) balls of k different colours from a bag, replacing the extracted ball after each draw. Balls from the same colour are equivalent. Denote the variable which is the number of extracted balls of colour \(i\) (\(i = 1,\cdots, k\)) as \(X_i\), and denote as \(p_i\) the probability that a given extraction will be in color \(i\).

\[f(x_1,\cdots\,x_k;\,n;\, p_1,\cdots,p_k) = \frac{n!}{x_1!\cdots x_k!}p_1^{x_1}\cdots p_k^{x_k}\].

The computational problem comes about when trying to figure out all the possible permutations in the values of each level that would be even less probable than the values in the sample.

The test is in the {ENT} package, and the function is:

multinomial.test(c(observed counts), c(expected probabilities)

The number of permutation is called events and is calculated as \({n+k-1}\choose{k-1}\) where \(k\) is the number of groups, and \(n\) is the number of observations.

Looking into the function, if we want to check the probability that we observe proportions A = 5, B = 2, C = 2 (low digits to make the rest of the demonstration easy) coming from a uniform distribution we’ll call:

library(EMT)
observed =c(A = 5, B = 2, C = 2)
prob =rep(1/length(observed),3)
out <- multinomial.test(observed, prob, useChisq = FALSE, MonteCarlo = FALSE)

## 
##  The model includes 55 different events.
## 
## 
##  Exact Multinomial Test
## 
##     Events    pObs    p.value
##         55  0.0384     0.5306

This function calls on another function called ExactMultinomialTest(observed, prob, size, groups, numEvents) with the size calculated as sum(observed), the groups as length(observed) - i.e. how many proportions -, and the numEvents as \({\text{size+groups-1}}\choose{\text{groups-1}}\):

choose(9 + 3 - 1, 3 - 1)

## [1] 55

What this function does is:

Calculates the multinomial probability (pObs) for the observed counts and probabilities: dmultinom(observed, size, prob):

(pObs = dmultinom(observed, size=9, prob))

## [1] 0.03840878

Unfolds the permutations possible into a matrix through the operation…

head(findVectors(groups=3, size=sum(observed)))

##      [,1] [,2] [,3]
## [1,]    0    0    9
## [2,]    0    1    8
## [3,]    1    0    8
## [4,]    0    2    7
## [5,]    1    1    7
## [6,]    2    0    7

#Notice that each row equals the size, in this case, 9:
apply(head(findVectors(groups=3, size=sum(observed))),1, function(x) sum(x))

## [1] 9 9 9 9 9 9

Then it calculates the exact multinomial probability for each one of these rows and selects the probabilities that are less than pObs. the sum of more unlikely events is the p.value:

eventMat = findVectors(groups=3, size=sum(observed))
nrow(eventMat)

## [1] 55

eventProb <- apply(eventMat, 1, function(x) dmultinom(x, 
        size = 9, prob = prob))
(p.value = sum(eventProb[eventProb <= pObs]))

## [1] 0.5305594

The same calculation can be done using asymptotics as a chi-square GoF test:

chisq.test(observed, p = prob, correct = F)

## Warning in chisq.test(observed, p = prob, correct = F): Chi-squared
## approximation may be incorrect

## 
##  Chi-squared test for given probabilities
## 
## data:  observed
## X-squared = 2, df = 2, p-value = 0.3679

Not even close in this small sample…

Post-hoc test:

This is a quote extracted from here:

“If you perform the exact multinomial test (with more than two categories) and get a significant result, you may want to follow up by testing whether each category deviates significantly from the expected number. It’s a little odd to talk about just one category deviating significantly from expected; if there are more observations than expected in one category, there have to be fewer than expected in at least one other category. But looking at each category might help you understand better what’s going on.

For example, let’s say you do a genetic cross in which you expect a \(9:3:3:1\) ratio of purple, red, blue, and white flowers, and your observed numbers are \(72\) purple, \(38\) red, \(20\) blue, and \(18\) white. You do the exact test and get a P value of \(0.0016\)(?), so you reject the null hypothesis. There are fewer purple and blue and more red and white than expected, but is there an individual color that deviates significantly from expected?

ratio <- c(9,3,3,1)
p <- ratio/sum(ratio)
chisq.test(c(72,38,20,18), p = p, correct = F)

## 
##  Chi-squared test for given probabilities
## 
## data:  c(72, 38, 20, 18)
## X-squared = 15.748, df = 3, p-value = 0.001277

#or
multinomial.test(c(72,38,20,18), p, useChisq = FALSE, MonteCarlo = FALSE)

## 
##  The model includes 562475 different events.
## 
##  The chosen number of trials is rather low, should be at least 10 times the numver of events.
## 
## 
##  Exact Multinomial Test
## 
##     Events    pObs    p.value
##     562475       0     0.0023

To answer this, do an exact binomial test for each category vs. the sum of all the other categories. For purple, compare the \(72\) purple and \(76\) non-purple to the expected \(9:7\) ratio:

binom.test(72, 72 + 76, 9/(9+7) ,alternative = "two.sided")

## 
##  Exact binomial test
## 
## data:  72 and 72 + 76
## number of successes = 72, number of trials = 148, p-value = 0.06822
## alternative hypothesis: true probability of success is not equal to 0.5625
## 95 percent confidence interval:
##  0.4035796 0.5699452
## sample estimates:
## probability of success 
##              0.4864865

The P value is \(0.07\), so you can’t say there are significantly fewer purple flowers than expected. There are \(38\) red and \(110\) non-red flowers; when compared to the expected \(3:13\) ratio, the P value is \(0.035\).

binom.test(38, 38 + 110, 3/(3+13) ,alternative="two.sided")

## 
##  Exact binomial test
## 
## data:  38 and 38 + 110
## number of successes = 38, number of trials = 148, p-value = 0.03504
## alternative hypothesis: true probability of success is not equal to 0.1875
## 95 percent confidence interval:
##  0.1885441 0.3349714
## sample estimates:
## probability of success 
##              0.2567568

This is below the significance level of \(0.05\), but because you’re doing four tests at the same time, you need to correct for the multiple comparisons. Applying the Bonferroni correction, you divide the significance level (\(0.05\)) by the number of comparisons (\(4\)) and get a new significance level of \(0.0125\); since \(0.035\) is greater than this, you can’t say there are significantly more red flowers than expected. Comparing the \(18\) white and \(130\) non-white to the expected ratio of \(1:15\), the P value is \(0.006\),

binom.test(18, 18+130, 1/(1+15) ,alternative="two.sided")

## 
##  Exact binomial test
## 
## data:  18 and 18 + 130
## number of successes = 18, number of trials = 148, p-value = 0.006057
## alternative hypothesis: true probability of success is not equal to 0.0625
## 95 percent confidence interval:
##  0.07369845 0.18539037
## sample estimates:
## probability of success 
##              0.1216216

so you can say that there are significantly more white flowers than expected. It is possible that an overall significant P value could result from moderate-sized deviations in all of the categories, and none of the post-hoc tests will be significant. This would be frustrating; you’d know that something interesting was going on, but you couldn’t say with statistical confidence exactly what it was.

Home Page

NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.