It is the exact counterpart of the chi square goodness-of-fit (GoF) test, and extends the binomial test to multiple categories. So the binomial test would be a form of exact goodness-of-fit testing for one single proportion, while the multinomial test would deal with a categorical variable with multiple values or levels.
To calculate the multinomial probability of a given distribution of counts given certain expected probabilities, the math is straightforward:
Suppose one does an experiment of extracting \(n\) balls of k different colours from a bag, replacing the extracted ball after each draw. Balls from the same colour are equivalent. Denote the variable which is the number of extracted balls of colour \(i\) (\(i = 1,\cdots, k\)) as \(X_i\), and denote as \(p_i\) the probability that a given extraction will be in color \(i\).
\[f(x_1,\cdots\,x_k;\,n;\, p_1,\cdots,p_k) = \frac{n!}{x_1!\cdots x_k!}p_1^{x_1}\cdots p_k^{x_k}\].
The computational problem comes about when trying to figure out all the possible permutations in the values of each level that would be even less probable than the values in the sample.
The test is in the {ENT}
package, and the function
is:
multinomial.test(c(observed counts), c(expected probabilities)
The number of permutation is called events
and is
calculated as \({n+k-1}\choose{k-1}\)
where \(k\) is the number of groups,
and \(n\) is the number of
observations.
Looking into the function, if we want to check the probability that
we observe proportions A = 5, B = 2, C = 2
(low digits to
make the rest of the demonstration easy) coming from a uniform
distribution we’ll call:
library(EMT)
observed =c(A = 5, B = 2, C = 2)
prob =rep(1/length(observed),3)
out <- multinomial.test(observed, prob, useChisq = FALSE, MonteCarlo = FALSE)
##
## The model includes 55 different events.
##
##
## Exact Multinomial Test
##
## Events pObs p.value
## 55 0.0384 0.5306
This function calls on another function called
ExactMultinomialTest(observed, prob, size, groups, numEvents)
with the size
calculated as sum(observed)
, the
groups
as length(observed)
- i.e. how many
proportions -, and the numEvents
as \({\text{size+groups-1}}\choose{\text{groups-1}}\):
choose(9 + 3 - 1, 3 - 1)
## [1] 55
What this function does is:
pObs
) for the
observed counts and probabilities:
dmultinom(observed, size, prob)
:(pObs = dmultinom(observed, size=9, prob))
## [1] 0.03840878
head(findVectors(groups=3, size=sum(observed)))
## [,1] [,2] [,3]
## [1,] 0 0 9
## [2,] 0 1 8
## [3,] 1 0 8
## [4,] 0 2 7
## [5,] 1 1 7
## [6,] 2 0 7
#Notice that each row equals the size, in this case, 9:
apply(head(findVectors(groups=3, size=sum(observed))),1, function(x) sum(x))
## [1] 9 9 9 9 9 9
pObs
. the sum of more unlikely events is the
p.value
:eventMat = findVectors(groups=3, size=sum(observed))
nrow(eventMat)
## [1] 55
eventProb <- apply(eventMat, 1, function(x) dmultinom(x,
size = 9, prob = prob))
(p.value = sum(eventProb[eventProb <= pObs]))
## [1] 0.5305594
The same calculation can be done using asymptotics as a chi-square GoF test:
chisq.test(observed, p = prob, correct = F)
## Warning in chisq.test(observed, p = prob, correct = F): Chi-squared
## approximation may be incorrect
##
## Chi-squared test for given probabilities
##
## data: observed
## X-squared = 2, df = 2, p-value = 0.3679
Not even close in this small sample…
This is a quote extracted from here:
“If you perform the exact multinomial test (with more than two categories) and get a significant result, you may want to follow up by testing whether each category deviates significantly from the expected number. It’s a little odd to talk about just one category deviating significantly from expected; if there are more observations than expected in one category, there have to be fewer than expected in at least one other category. But looking at each category might help you understand better what’s going on.
For example, let’s say you do a genetic cross in which you expect a \(9:3:3:1\) ratio of purple, red, blue, and white flowers, and your observed numbers are \(72\) purple, \(38\) red, \(20\) blue, and \(18\) white. You do the exact test and get a P value of \(0.0016\)(?), so you reject the null hypothesis. There are fewer purple and blue and more red and white than expected, but is there an individual color that deviates significantly from expected?
ratio <- c(9,3,3,1)
p <- ratio/sum(ratio)
chisq.test(c(72,38,20,18), p = p, correct = F)
##
## Chi-squared test for given probabilities
##
## data: c(72, 38, 20, 18)
## X-squared = 15.748, df = 3, p-value = 0.001277
#or
multinomial.test(c(72,38,20,18), p, useChisq = FALSE, MonteCarlo = FALSE)
##
## The model includes 562475 different events.
##
## The chosen number of trials is rather low, should be at least 10 times the numver of events.
##
##
## Exact Multinomial Test
##
## Events pObs p.value
## 562475 0 0.0023
To answer this, do an exact binomial test for each category vs. the sum of all the other categories. For purple, compare the \(72\) purple and \(76\) non-purple to the expected \(9:7\) ratio:
binom.test(72, 72 + 76, 9/(9+7) ,alternative = "two.sided")
##
## Exact binomial test
##
## data: 72 and 72 + 76
## number of successes = 72, number of trials = 148, p-value = 0.06822
## alternative hypothesis: true probability of success is not equal to 0.5625
## 95 percent confidence interval:
## 0.4035796 0.5699452
## sample estimates:
## probability of success
## 0.4864865
The P value is \(0.07\), so you can’t say there are significantly fewer purple flowers than expected. There are \(38\) red and \(110\) non-red flowers; when compared to the expected \(3:13\) ratio, the P value is \(0.035\).
binom.test(38, 38 + 110, 3/(3+13) ,alternative="two.sided")
##
## Exact binomial test
##
## data: 38 and 38 + 110
## number of successes = 38, number of trials = 148, p-value = 0.03504
## alternative hypothesis: true probability of success is not equal to 0.1875
## 95 percent confidence interval:
## 0.1885441 0.3349714
## sample estimates:
## probability of success
## 0.2567568
This is below the significance level of \(0.05\), but because you’re doing four tests at the same time, you need to correct for the multiple comparisons. Applying the Bonferroni correction, you divide the significance level (\(0.05\)) by the number of comparisons (\(4\)) and get a new significance level of \(0.0125\); since \(0.035\) is greater than this, you can’t say there are significantly more red flowers than expected. Comparing the \(18\) white and \(130\) non-white to the expected ratio of \(1:15\), the P value is \(0.006\),
binom.test(18, 18+130, 1/(1+15) ,alternative="two.sided")
##
## Exact binomial test
##
## data: 18 and 18 + 130
## number of successes = 18, number of trials = 148, p-value = 0.006057
## alternative hypothesis: true probability of success is not equal to 0.0625
## 95 percent confidence interval:
## 0.07369845 0.18539037
## sample estimates:
## probability of success
## 0.1216216
so you can say that there are significantly more white flowers than expected. It is possible that an overall significant P value could result from moderate-sized deviations in all of the categories, and none of the post-hoc tests will be significant. This would be frustrating; you’d know that something interesting was going on, but you couldn’t say with statistical confidence exactly what it was.
NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.