The variance of the random variable \(X\) is defined as:
\[\text{Var}(X)= \mathbb E\left[ \left(\,X - \mathbb E(X)\, \right)^2\right]\tag 1.\]
In many practical situations, the true variance of a population is not known a priori and must be computed somehow. When dealing with extremely large populations, it is not possible to count every object in the population, so the computation must be performed on a sample of the population.
An unbiased estimator of the population variance is obtained with the formula:
\[s^2 = \frac{1}{n-1}\displaystyle \sum_1^n\,\left(x_i - \bar x\right)^2\]
In terms of the raw moments, \(\mathbb E(X)\) and \((\mathbb E(X^2))\), the variance of a random variable \(X\) can be calculated using the König-Huygens formula:
\[\text{Var}(X) = \mathbb E\left(X^2 \right)\quad-\quad \left[\;\mathbb E\left(X\right)\;\right]^2\]
Proof:
\[ \begin{align} \text{Var}(X) &= \mathbb E\left[ \left(\,X - \mathbb E(X)\, \right)^2\right]\\[2ex] &=\mathbb E\left[X^2 - 2\,X\,\mathbb E(X)\, +\left[\,\mathbb E(X)\, \right]^2 \right]\\[2ex] &=\mathbb E (X^2) -\mathbb E \left[\,2\,X\,\mathbb E(X)\, \right]+\left[\,\mathbb E(X)\, \right]^2 \\[2ex] &=\mathbb E (X^2) -2\,\mathbb E[X]\;\mathbb E(X)\, +\left[\,\mathbb E(X)\, \right]^2 \\[2ex] &=\mathbb E (X^2) -2\,\left[\mathbb E(X)\right]^2\, +\left[\,\mathbb E(X)\, \right]^2 \\[2ex] &=\mathbb E\left(X^2 \right)\quad-\quad \left[\;\mathbb E\left(X\right)\;\right]^2\\[2ex] &=\frac{\displaystyle\sum_{i=1}^n x_i^2}{n} \quad-\quad \left[ \frac{\displaystyle\sum_{i=1}^n x_i}{n}\right]^2 \end{align} \]
The same applies to the covariance matrix. In the following proof \(X\) is a \(p\times 1\) random vector:
\[ \begin{align} \text{Var}(X)&=\mathbb E\left[\,\left(X- \mathbb E[X] \right) \, \left(X- \mathbb E[X] \right)^\top \right]\\[2ex] &= \mathbb E\left[\,XX^\top - X\,\mathbb E[X]^\top - \mathbb E[X] \,X^\top + \mathbb E[X] \mathbb E\, [X]^\top \right]\\[2ex] &= \mathbb E\left[\,XX^\top\right] - \mathbb E [X]\,\mathbb E[X]^\top - \mathbb E[X] \,\mathbb E [X]^\top + \mathbb E[X] \mathbb E\, [X]^\top \\[2ex] &= \mathbb E[XX^\top] \; -\; \mathbb E[X]\, \mathbb E[X]^\top \end{align} \]
When estimating from the sample, though, we have to keep in mind when applying this formula that it is derived with the understanding that the denominator is \(n\) as implicit in Eq.\((1)\). So if we want to make the denominator \(n-1\), we have to multiply by \(\frac{n}{n-1}\) (Bessel’s correction), and we have to do the same in the alternate formula:
\[\begin{align}S^2 &= \frac{n}{n-1}\,\left(\, \frac{\sum_{1}^n x_i^2}{n} -\left[ \frac{\sum_{1}^n x_i}{n}\right]^2\right)\\[2ex] &= \,\frac{n\displaystyle \sum_{i=1}^n x_i^2 -\left[ \displaystyle \sum_{i=1}^n x_i\right]^2}{(n-1)\,n} \end{align}\]
We can apply the same concept to the covariance matrix. In what follows \(X\) is a \(p \times n\) data matrix:
\[\begin{align} \sigma^2(X)&=\frac{XX^\top}{n-1} \; -\; \frac{n}{n-1}\; \begin{bmatrix}\bar X_1\\ \bar X_2\\ \vdots \\ \bar X_p \end{bmatrix}\, \; \begin{bmatrix}\bar X_1 & \bar X_2 & \cdots & \bar X_p \end{bmatrix} \end{align} \]
because it is necessary to multiply by \(\frac{n}{n-1}\) to get rid of the biased \(n\) denominator, and replace it with \(n-1\). Evidently anything after the minus sign disappears if the mean is zero.
In R
X = rnorm(5)
var(X)
## [1] 1.488821
( (t(X) %*% X) / (length(X) - 1) ) - (length(X)/(length(X)-1)) * mean(X)^2
## [,1]
## [1,] 1.488821
The same applies to the covariance between different random vectors:
set.seed(0)
# Sampling three random vectors: V1, V2 and V3:
X1 = 1:5
X2 = rnorm(5, 5, 1)
X3 = runif(5)
# Forming a matrix with each row representing a sample from a random vector:
(X = (rbind(X1,X2,X3)))
## [,1] [,2] [,3] [,4] [,5]
## X1 1.00000000 2.0000000 3.0000000 4.0000000 5.0000000
## X2 6.26295428 4.6737666 6.3297993 6.2724293 5.4146414
## X3 0.06178627 0.2059746 0.1765568 0.6870228 0.3841037
# Taking the estimate of the expectation for each random vector bar X_i:
mu = rowMeans(X)
# Calculating manually the variance with the alternate formula
(man.cov = ((X %*% t(X)) / (ncol(X) - 1)) - (ncol(X)/(ncol(X) - 1)) * (mu %*% t(mu)))
## X1 X2 X3
## X1 2.50000000 -0.02449075 0.28142079
## X2 -0.02449075 0.53366886 0.02019664
## X3 0.28142079 0.02019664 0.05940930
# Comparing to the built-in formula:
cov(t(X))
## X1 X2 X3
## X1 2.50000000 -0.02449075 0.28142079
## X2 -0.02449075 0.53366886 0.02019664
## X3 0.28142079 0.02019664 0.05940930
# are the same...
all.equal(man.cov, cov(t(X)))
## [1] TRUE
NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.