NOTES ON STATISTICS, PROBABILITY and MATHEMATICS


Power and Sample Size Calculation:


See also this entry.

From Brian Caffo’s Coursera biostatistics video:

STATISTICAL POWER CALCULATION:

If we know \(\sigma\) and \(n\) is large, and with \(\beta\) being the type II error rate, the power is \(1-\beta\)

\[\begin{align} 1-\beta &= \Pr\left(\frac{\bar X -\mu_0}{\sigma/\sqrt{n}} > z_{1-\alpha} \mid \mu = \mu_a \right)\\[3ex] &= \Pr\left(\frac{\bar X-\mu_a +\mu_a -\mu_0}{\sigma/\sqrt{n}} > z_{1-\alpha} \mid \mu = \mu_a \right)\\[3ex] &=\Pr\left(\frac{\bar X -\mu_a}{\sigma/\sqrt{n}} > z_{1-\alpha} - \frac{\mu_a-\mu_0}{\sigma/\sqrt{n}} \mid \mu = \mu_a \right)\\[3ex] &= \Pr\left(Z > z_{1-\alpha} - \frac{\mu_a-\mu_0}{\sigma/\sqrt{n}} \mid \mu = \mu_a \right) \end{align}\]

Suppose that we wanted to detect an increase in mean of the RDI (respiratory disturbance index) in the context of sleep apnea of at least \(2\small \text{ events/hour}\) above \(30\). Assume normality and that the sample in question has a standard deviation of \(4\). What would be the power if we took a sample of \(16?\)

\[Z_{1-\alpha}=1.645\] or…

qnorm(0.95)
## [1] 1.644854

and with \(\mu_a\) being the true mean under the alternative hypothesis (i.e. sleep-apnea carries along a higher number of RDI with a mean of 32):

\[\frac{\mu_a - 30}{\sigma/\sqrt{n}}=\frac{2}{4/\sqrt{16}}=2\]

Therefore,

\[\Pr(Z>1.645-2)=\Pr(Z>-0.355)=64\%\]

or…

1 - pnorm(qnorm(0.95) - 2/(4 / sqrt(16)))
## [1] 0.63876

STATISTICAL SAMPLE SIZE CALCULATION:

What \(n\) sample size would be required to get a power of \(80\,\%\) (a common benchmark in the sciences)?

For a one-sided test (\(H_a: \mu_a > \mu_0\)):

\[0.8=\Pr\left(Z> \, z_{1-\alpha} -\frac{\mu_a -\mu_o}{s/\sqrt{n}} \mid \mu=\mu_a\right)\]

which implies that

\[z_{1-\alpha} - \frac{\mu_a -\mu_o}{s/\sqrt{n}} = z_{0.20}\]

We set \(z_{1-\alpha} - \frac{\mu_a -\mu_o}{s/\sqrt{n}} = z_{0.2}\) and solve for \(n\) for any value of \(\mu_a\):

\[n=\left( \sigma \frac{z_{1-\alpha} - z_{0.20}}{\mu_a -\mu_0} \right)^2\]

We pick \(\mu_a\) as the smallest effect that we would reasonably like to detect.

In the cases of \(H_a:\mu_a \neq \mu_0\) we can just take one of the sides but with \(\alpha/2\).

For the example above:

(n <- (4*(qnorm(0.95)-qnorm(0.2))/2)^2)
## [1] 24.73023

which would indeed carry an \(80\%\) power:

1 - pnorm(qnorm(0.95) - 2/(4 / sqrt(n)))
## [1] 0.8

Home Page

NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.