VARIANCE IN TERMS OF RAW MOMENTS (KÖNIG-HUYGENS FORMULA):

The variance of the random variable $$X$$ is defined as:

$\text{Var}(X)= \mathbb E\left[ \left(\,X - \mathbb E(X)\, \right)^2\right]\tag 1.$

In many practical situations, the true variance of a population is not known a priori and must be computed somehow. When dealing with extremely large populations, it is not possible to count every object in the population, so the computation must be performed on a sample of the population.

An unbiased estimator of the population variance is obtained with the formula:

$s^2 = \frac{1}{n-1}\displaystyle \sum_1^n\,\left(x_i - \bar x\right)^2$

In terms of the raw moments, $$\mathbb E(X)$$ and $$(\mathbb E(X^2))$$, the variance of a random variable $$X$$ can be calculated using the König-Huygens formula:

$\text{Var}(X) = \mathbb E\left(X^2 \right)\quad-\quad \left[\;\mathbb E\left(X\right)\;\right]^2$

Proof:

\begin{align} \text{Var}(X) &= \mathbb E\left[ \left(\,X - \mathbb E(X)\, \right)^2\right]\\[2ex] &=\mathbb E\left[X^2 - 2\,X\,\mathbb E(X)\, +\left[\,\mathbb E(X)\, \right]^2 \right]\\[2ex] &=\mathbb E (X^2) -\mathbb E \left[\,2\,X\,\mathbb E(X)\, \right]+\left[\,\mathbb E(X)\, \right]^2 \\[2ex] &=\mathbb E (X^2) -2\,\mathbb E[X]\;\mathbb E(X)\, +\left[\,\mathbb E(X)\, \right]^2 \\[2ex] &=\mathbb E (X^2) -2\,\left[\mathbb E(X)\right]^2\, +\left[\,\mathbb E(X)\, \right]^2 \\[2ex] &=\mathbb E\left(X^2 \right)\quad-\quad \left[\;\mathbb E\left(X\right)\;\right]^2\\[2ex] &=\frac{\displaystyle\sum_{i=1}^n x_i^2}{n} \quad-\quad \left[ \frac{\displaystyle\sum_{i=1}^n x_i}{n}\right]^2 \end{align}

The same applies to the covariance matrix. In the following proof $$X$$ is a $$p\times 1$$ random vector:

\begin{align} \text{Var}(X)&=\mathbb E\left[\,\left(X- \mathbb E[X] \right) \, \left(X- \mathbb E[X] \right)^\top \right]\\[2ex] &= \mathbb E\left[\,XX^\top - X\,\mathbb E[X]^\top - \mathbb E[X] \,X^\top + \mathbb E[X] \mathbb E\, [X]^\top \right]\\[2ex] &= \mathbb E\left[\,XX^\top\right] - \mathbb E [X]\,\mathbb E[X]^\top - \mathbb E[X] \,\mathbb E [X]^\top + \mathbb E[X] \mathbb E\, [X]^\top \\[2ex] &= \mathbb E[XX^\top] \; -\; \mathbb E[X]\, \mathbb E[X]^\top \end{align}

When estimating from the sample, though, we have to keep in mind when applying this formula that it is derived with the understanding that the denominator is $$n$$ as implicit in Eq.$$(1)$$. So if we want to make the denominator $$n-1$$, we have to multiply by $$\frac{n}{n-1}$$ (Bessel’s correction), and we have to do the same in the alternate formula:

\begin{align}S^2 &= \frac{n}{n-1}\,\left(\, \frac{\sum_{1}^n x_i^2}{n} -\left[ \frac{\sum_{1}^n x_i}{n}\right]^2\right)\\[2ex] &= \,\frac{n\displaystyle \sum_{i=1}^n x_i^2 -\left[ \displaystyle \sum_{i=1}^n x_i\right]^2}{(n-1)\,n} \end{align}

We can apply the same concept to the covariance matrix. In what follows $$X$$ is a $$p \times n$$ data matrix:

\begin{align} \sigma^2(X)&=\frac{XX^\top}{n-1} \; -\; \frac{n}{n-1}\; \begin{bmatrix}\bar X_1\\ \bar X_2\\ \vdots \\ \bar X_p \end{bmatrix}\, \; \begin{bmatrix}\bar X_1 & \bar X_2 & \cdots & \bar X_p \end{bmatrix} \end{align}

because it is necessary to multiply by $$\frac{n}{n-1}$$ to get rid of the biased $$n$$ denominator, and replace it with $$n-1$$. Evidently anything after the minus sign disappears if the mean is zero.

In R

X = rnorm(5)
var(X)
## [1] 0.4059979
( (t(X) %*% X) / (length(X) - 1) ) - (length(X)/(length(X)-1)) * mean(X)^2
##           [,1]
## [1,] 0.4059979

The same applies to the covariance between different random vectors:

set.seed(0)
# Sampling three random vectors: V1, V2 and V3:
X1 = 1:5
X2 = rnorm(5, 5, 1)
X3 = runif(5)

# Forming a matrix with each row representing a sample from a random vector:
(X = (rbind(X1,X2,X3)))
##          [,1]      [,2]      [,3]      [,4]      [,5]
## X1 1.00000000 2.0000000 3.0000000 4.0000000 5.0000000
## X2 6.26295428 4.6737666 6.3297993 6.2724293 5.4146414
## X3 0.06178627 0.2059746 0.1765568 0.6870228 0.3841037
# Taking the estimate of the expectation for each random vector bar X_i:
mu = rowMeans(X)

# Calculating manually the variance with the alternate formula
(man.cov = ((X %*% t(X)) / (ncol(X) - 1)) - (ncol(X)/(ncol(X) - 1)) * (mu %*% t(mu)))
##             X1          X2         X3
## X1  2.50000000 -0.02449075 0.28142079
## X2 -0.02449075  0.53366886 0.02019664
## X3  0.28142079  0.02019664 0.05940930
# Comparing to the built-in formula:
cov(t(X))
##             X1          X2         X3
## X1  2.50000000 -0.02449075 0.28142079
## X2 -0.02449075  0.53366886 0.02019664
## X3  0.28142079  0.02019664 0.05940930
# are the same...
all.equal(man.cov, cov(t(X)))
## [1] TRUE