If we want to test whether a *one* sample proportion \(\hat p\) is consistent with a population parameter \(p\) the **score** test statistic is:

\[\text{test statistic} = \large \frac{\hat p - p}{\sqrt{\frac{p\,(1\,-\,p)}{n}}}\]

This is equivalent to the z test statistic for sample means, and does follow a Z distribution in large samples. Compare to the z test of proportions to compare two samples here.

Notice that since we are testing under the \(H_o\) how likely it is to get the \(\hat p\) we obtained we use the population proportion \(p\) to calculate the standard error, as opposed to the sample proportion we would use to build the confidence interval with a Wald interval:

\[\large \mathrm{CI}=\hat p \pm Z_{1-\alpha/2} \times \sqrt{\frac{\hat p\,(1\,-\,\hat p)}{n}}\]

In [R]:

\[\large \hat p + \text{c}(-1, 1) \,\times\, \small \text{pnorm}(0.975) \,\times\, \large \sqrt{\frac{\hat p\,(1\,-\,\hat p)}{n}}\]

Example: We want to test if the proportion of side effects is greater than \(0.1\) for a new drug (\(p=0.1\)). In our sample of \(20\) cases, \(11\) had complications for a \(\hat p = 0.55\):

\[\text{test statistic}=\large \frac{0.55 - 0.1}{\sqrt{\frac{0.1 \times 0.9}{20}}} =6.7,\] which is clearly significant:

`qnorm(0.95) # One-sided test`

`## [1] 1.644854`

Alternatively, we can just run an exact binomial test, which doesn’t need to rely on the normal approximation:

\[\Pr(\text{counts} \geq 11)= \displaystyle \sum_{11}^{20} {20\choose X} \,0.1^X\, 0.9^{20-X}\]

With [R],

`pbinom(10, 20, .1, lower.tail = F) # Lower.tail false implies that it will count 11 and above.`

`## [1] 7.088606e-07`

Alternatively (same result),

`binom.test(11, 20, 0.1, alternative = "greater")`

```
##
## Exact binomial test
##
## data: 11 and 20
## number of successes = 11, number of trials = 20, p-value =
## 7.089e-07
## alternative hypothesis: true probability of success is greater than 0.1
## 95 percent confidence interval:
## 0.3469314 1.0000000
## sample estimates:
## probability of success
## 0.55
```

For two-sided tests calculate both one-sided tests and double the smallest p-value.

**QUESTION:**

This question appeared here.

A report says that 82% of British Columbians over the age of \(25\) are high school graduates. A survey of randomly selected residents of a certain city included \(1290\) who were over the age of \(25\), and \(1012\) of them were high school graduates. Is the city’s result of \(1012\) unusually high, low, or neither?

How should I approach this question? Any help and hints would be appreciated.

**ANSWER:**

First off let me get the algebraic nomenclature out of the way - I find this extremely slippery and often implied:

\(\pi_0\) is a reference value assumed to be true. It is not necessarily the population proportion, but rather a

*fixed*fraction or proportion to which we compare the sample to. For instance, the problem reads something along the lines: “Is our sample consistent with a population proportion of \(\small \pi_0 = 0.7\)?”\(\pi\) (or \(p\)) stands for the actual population proportion, but it’s too bad that we usually don’t know it and have to use instead the…

\(\hat \pi\) (or \(\hat p\)), which stands for the sample proportion. To make things more “friendly” sometimes \(p\) denotes the sample proportion…

\(n\) is the number of trials in the binomial experiment (or the number of sampled subjects in a poll).

\(\small Y\) number of “successes” (“success” interpreted sometimes like the word “positive” in Medicine - you don’t necessarily want it for yourself).

We are done! Well almost…

In our case we have \(\small \pi = 0.82\) and \(\small \hat \pi = 1012/1290 = 0.78\). And \(\small n = 1290\).

The MLE of \(\pi\) is the sample proportion, \(\hat \pi = \small successes/trials\), and the expectation for the number of \(\small successes\) is \(n\pi\). The sample proportion is an unbiased estimator of the population: \(\small E(\hat \pi)=\pi\) (and \(\small E(Y)=n\pi\)) and the standard error behaves very similarly to that of sampling distributions of sample means: \(\small SE\,(\hat \pi) = \sqrt{\frac{\pi(1-\pi)}{n}}\), remembering that \(var(\hat \pi)= \pi(1-\pi)\) (and \(var(Y)=n\pi(1-\pi)\)).

The test here as Glen indicates is a two-sided one-sample proportion test: \(H_0: \pi = \pi_0\) versus \(H_A: \pi \neq \pi_0\). Typically, a normal approximation with mean \(\pi\) and \(var = \pi(1-\pi)/n\) under the following conditions: \(n\hat\pi>5\) and \(n(1-\hat\pi)>5\). In our case this is clearly met (\(\small 1290 * 0.78 = 1006\)).

The \(z\) test statistic is \(\large z=\Large\frac{\hat\pi-\pi_0}{\sqrt{\frac{\pi_0\,(1-\pi_0)}{n}}}\). In our case, \(z =\large \frac{0.78 - 0.82}{\sqrt{\frac{0.82(1-0.82)}{1290}}}=\large \frac{-0.04}{\sqrt{\frac{0.15}{1290}}}= \small-3.32\), which is clearly significant since `c(-1,1) * qnorm(1 - 0.05/2) = [1] -1.96 1.96`

, and `pnorm(-3.32) =0.00045`

. This latter expression corresponding to the [R] code for the two-tailed cut-off quantile values fixing the alpha significance level at 5%: \(z_{(1 - \alpha/2)}\) where \(\small \alpha = 0.05\).

As for the Wald confidence intervals, the calculation is:

\(\large \hat \pi \pm z_{(1-\alpha/2)}\,\sqrt{\frac{\hat\pi(1-\hat\pi)}{n}}\). Coded in [R]:

`0.78 + c(-1,1) * qnorm(1 - 0.05/2) * sqrt((0.78 * (1 - 0.78)) / 1290) = 0.7573946 to 0.8026054`

, which does not include \(\small \pi_0 = 0.82\).

Since we know the population proportion \(\pi\) in this case as being \(0.82\) we can use it to construct the confidence interval as: \(\large \pi \pm z_{(1-\alpha/2)}\,\sqrt{\frac{\pi(1-\pi)}{n}}\), or: `0.82 + c(-1,1) * qnorm(1 - 0.05/2) * sqrt((0.82 * (1 - 0.82)) / 1290) = 0.7990349 to 0.8409651`

. This excludes the sample value \(0.78\).

Reference:

*Categorical Data Analysis*, Second Edition by Alan Agresti (p. 14)