From this post.

So we have a \(3 \times 2\) matrix with margins as follows:

```
m = matrix(c(4,5,23,20,104,496), nrow=3)
dimnames(m) = list(c("A", "B", "C"), c("Yes","No"))
addmargins(m)
Yes No Sum
A 4 20 24
B 5 104 109
C 23 496 519
Sum 32 620 652
```

The key lines in the `chisq.test()`

function dealing with the Monte Carlo option are:

```
if (simulate.p.value) {
nx <- length(x)
sm <- matrix(sample.int(nx, B * n, TRUE, prob = p),
nrow = n)
ss <- apply(sm, 2L, function(x, E, k) {
sum((table(factor(x, levels = 1L:k)) - E)^2/E)
}, E = E, k = nx)
PARAMETER <- NA
PVAL <- (1 + sum(ss >= almost.1 * STATISTIC))/(B +
1)
}
```

The function takes in as input at the beginning of the call `x`

in

```
chisq.test(x, y = NULL, correct = TRUE,
p = rep(1/length(x), length(x)), rescale.p = FALSE,
simulate.p.value = FALSE, B = 2000)
```

Therefore `x = m`

. Now, `nx`

is the `length(x)`

, which is the number of cells (\(6\) cells in this example). `n`

is the total number of counts, which can be observed as the grand total in the printed out table above \((652)\), and is generally calculated as `n = sum(x)`

.

*Under the null hypothesis* of independence, the probability of each cell in the matrix is uniformly `p = rep(1/length(x), length(x)) # 0.1666667`

, which implies that the expected value is `E = n * p # 108.6667.`

With this the actual Monte Carlo simulation is carried out with the line:

`sm = matrix(sample.int(nx, B * n, TRUE, prob = p), nrow = n)`

which spells the following command: Sample from the integers `nx`

(i.e. \(1\) to \(6\)) so as to randomly choose one of the six cells in the matrix, `B * n`

times, where `B`

is the number of simulations, and `n`

is the total number of counts in the original matrix \((652)\). `TRUE`

stands for `replace = TRUE`

because we want to be able to select each cell more than once. And `p`

gives directions to use uniform probability across cells in this case. So we get an insane amount of picks from \(1\) to \(6\), which the `matrix( , nrow = n)`

organizes in columns of in this case `n`

entries, i.e. \(652\) (the total counts above). Each column is therefore a random table with the same totals as in the original matrix. Now it is just a matter of counting:

`ss = apply(sm, 2L, function(x, E, k) {sum((table(factor(x, levels = 1L:k)) - E)^2/E)}, E = E, k = nx)`

is commanding to do the following operations on `sm`

, columnwise (`2L`

), i.e. for each simulation:

Tabulate the number of times each cell \(1: 6\) was picked (

`table(x, levels = 1L:K)`

. Here ’k = nx = c`.Calculate the chi square statistic as \(\chi^2=\displaystyle \sum_i \frac{(O_i-E_i)^2}{E_i}\) for each column (simulation).

The last step is simply to calculate in what proportion of cases this chi square statistic is bigger than the chi square statistic calculated on the original matrix. This is the `PVAL.`