Processing math: 100%

NOTES ON STATISTICS, PROBABILITY and MATHEMATICS


Multivariate Gaussian:


The univariate Gaussian (XN(μ,σ2) is:

f(x)=12πσ2exp(12σ2(xμ)2),R.

The degenerate Gaussian has variance equal to 0 and hence, X(ω)=μ,ωΩ.

The multivariate Gaussian is defined for XRn as any linear combination of univariate Gaussian distributions Xi:

aTX=ni=1aiXi for aRn.

We will express it as XN(μ,Σ), where μ is a vector in Rn. The E(Xi)=μi; and the covariance matrix is an n×n positive semidefinite matrix, such that Σ(Xi,Xj)=Covij.

A multivariate Gaussian is degenerated if the det(Σ)=0.

If the Gaussian distributions (components) are independent, the covariance matrix is:

Σ=[σ2100σ22]

If the variance is 1 for both Gaussians:

If the variances are different, say 1 and 4:

This is the Matlab code:

pkg load 'statistics' % This only works like this in Octave
mu = [0,0]; %// data
sigma = [1 0; 0 4]; %// data
x = -5:.2:5; %// x axis
y = -5:.2:5; %// y axis

[X Y] = meshgrid(x,y); %// all combinations of x, y
Z = mvnpdf([X(:) Y(:)],mu,sigma); %// compute Gaussian pdf
Z = reshape(Z,size(X)); %// put into same size as X, Y
colormap(jet)
surf(X,Y,Z) %// ... or 3D plot

The density of an M-dimensional multivariate Gaussian is:

f(X|μ,Σ)=1(2π)M/2det(Σ)1/2exp[12(Xμ)TΣ1(Xμ)]


The important part of the formula is the exponent, (Xμ)TΣ1(Xμ), which is a positive definite quadratic function. The part in front is just a normalizing factor.

A quadratic form in linear algebra is of the form xTAx, providing the formula for ellipsoids in higher dimensions. And Σ can be visualized as an “error elipsoid” around the mean.


Ellipsoids are of the form x2/a2+y2/b2+z2/c2=1:


This is Dr. Strang’s example of a 3×3 positive definite matrix. What he calls “the good matrix”:

A=[210121012]

Proving that it is positive definite through the xTAx rule…

xTAx=2x21+2x22+2x232x1x22x2x3>0

we could complete the square to proof that the inequality is true.

We are in four dimensions, being that we have a function. But if we cut through this thing at height one, we get an ellipsoid (a lopsided football) with its axes determined by the eigenvalues in the factorization QΛQT, where Q is the matrix of eigenvectors, and Λ the diagonal of the squared eigenvalues, λi0. Hence,

QΛQT=QΛ1/2Λ1/2QT=(QΛ1/2)(QΛ1/2)T=AAT


Affine Property of the Gaussian:

Any affine transformation f(x)=AX+b of a Gaussian is a Gaussian. If XN(μ,Σ), AX+bN(Aμ+b,AΣAT).

So if X1,X2,,XnN(0,1) iid, placing them in a vector, we get X(0,I), and Ax+μN(μ,Σ), where Σ=AAT. This is a form of generating multivariate Gaussians.

Sphering:

“Sphering” turns a Gaussian into a “sphere” (multivariate standard) through an affine transformation. So it converts Gaussians back to standard multivariate normal.

YN(μ,Σ)A1(Yμ)N(0,I), where Σ=AAT.


Using equation (1) we can use express the A in Y=AX+μN(μ,Σ) (blue equation) as A=QΛ1/2, giving Y=AX+μ=QΛ1/2X+μ. Now we can apply the red equation to Λ1/2X, rendering Λ1/2X+μN(Λ1/2×0+0,Λ1/2IΛ1/2)=N(0,Λ). Λ is geometrically the degree of stretching of the distribution (the variance). When we multiply by Q (an orthogonal matrix) QΛ1/2 we end up with QΛ1/2N(0,Σ). An orthogonal matrix give a reflection or rotation.

So by applying an affine transformation to XN(0,I) we end up stretching and rotating. The μ centers the multivariate (shif).


MARGINAL DISTRIBUTION:

If we have a multivariate Gaussian X=[X1,X2]R2 with two dimensions, the coordinates are also Gaussian (X1 and X2 are Gaussian).

Proof:

In general, we can decompose an n-dimension multivariate Gaussian matrix XN(μ,Σ) by getting the first k components, indexed as a=2,,k. We’ll try to show that these first k components are Gaussian. The rest of the components are indexed by b=k+1,,n. Now X can be expressed as a block vector:

X=[XaXb] with Xa=[X1Xk] and Xb=[Xk+1Xn].

We can decompose μ=[μaμb]

and Σ into the block matrix Σ=[ΣaaΣabΣbaΣbb]. This is a block matrix because Σaa=[σ2(X1)cov(X1,Xk)cov(Xk,X1)σ2(Xk)]

The marginalization property states that XaN(μa,Σaa) is multivariate normal. We can prove it using the affine property and using the projection matrix A (without blocks it is of the form [10] or [01]):


A=[100000000001000000000010000000000100000000001000000000010000]

which is (k×n).

And by construction,

AX=Xa which by the affine property (red equation):

Given that Aμ=μa (the projection of the means); and AΣAT=Σaa,

AX=AaN(μa,Σaa).

The same can be done with Xb.


####CONDITIONAL DISTRIBUTION:

If X=(X1,X2)TR2(X1|X2=x2) is Gaussian.

Using the same block matrices as above:

(Xa|Xb=xb)N(m,D) where m=μa+ΣabΣ1bb(xbμb) and D=ΣaaΣabΣ1bbΣba.

For a full derivation see here and here.


Home Page

NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.