NOTES ON STATISTICS, PROBABILITY and MATHEMATICS


Confidence Interval & Margin of Error:


The standard error (SE) of the sampling distribution a proportion \(p\) is defined as:

\(\text{SE}_p=\sqrt{\frac{p\,(1-p)}{n}}\). This can be contrasted to the standard deviation (SD) of the sampling distribution of a proportion \(\pi\): \(\sigma_p=\sqrt{\frac{\pi\,(1-\pi)}{n}}\).


Confidence Interval:


The confidence interval estimates the population parameter \(\pi\) based on the sampling distribution and the central limit theorem (CLT) that allows a normal approximation. Hence, given a SE, and a proportion, \(95\%\) the confidence interval will be calculated as:

\[p\,\pm\,Z_{\alpha/2}\,\text{SE}\]

Given that \(Z_{\alpha/2}=Z_{0.975}=1.959964\sim1.96\), the CI will be:

\[p\,\pm\,1.96\,\sqrt{\frac{p\,(1-p)}{n}}\].

This raises a question regarding the utilization of the normal distribution even if we really don’t know the population SD - when estimating confidence intervals for means, if the SE is used in lieu of the SD, the \(t\) distribution is typically felt to be a better choice due to its fatter tails. However, in the case of a proportion, there is only one parameter, \(p\), being estimated, since the formula for the Bernouilli variance is entirely dependent on \(p\) as \(p\,(1-p)\). This is very nicely explained here:

“Informal answer: the t distribution comes into play when there are two unknown population parameters, the population mean and the population variance. The heavier tails of the t-distribution (versus the normal distribution) capture the extra uncertainty we incur from estimating the population variance. If an oracle told us the population variance, we could forget the t distribution and just work with the normal.

In the case of Bernoulli data (0s and 1s), there is really just one parameter to estimate. That is because in the special case of a population of 0’s and 1’s the variance is determined by the mean: the mean is the fraction \(p\) of 1’s, the variance is \(p(1−p)\). (This is easy to see if you know the variance is the averaged squared value minus the squared average value). The sample proportion gives us estimates of both the mean and variance, so we do not add uncertainty in the same way as if we had to estimate the variance separately. That is why the normal approximation works better in this case than the t-distribution.”


Margin of Error:


The margin of error is simply the “radius” (or half the width) of a confidence interval for a particular statistic, in this case the sample proportion:

\(\text{ME}_{\text{@ 95% CI}}=1.96\,\sqrt{\frac{p\,(1-p)}{n}}\).

Graphically,

enter image description here


Home Page

NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.