The univariate Gaussian (X∼N(μ,σ2) is:
f(x)=1√2πσ2exp(−12σ2(x−μ)2),∀∈R.
The degenerate Gaussian has variance equal to 0 and hence, X(ω)=μ,∀ω∈Ω.
The multivariate Gaussian is defined for X∈Rn as any linear combination of univariate Gaussian distributions Xi:
aTX=∑ni=1aiXi for ∀a∈Rn.
We will express it as X∼N(μ,Σ), where μ is a vector in Rn. The E(Xi)=μi; and the covariance matrix is an n×n positive semidefinite matrix, such that Σ(Xi,Xj)=Covij.
A multivariate Gaussian is degenerated if the det(Σ)=0.
If the Gaussian distributions (components) are independent, the covariance matrix is:
Σ=[σ2100σ22]
If the variance is 1 for both Gaussians:
If the variances are different, say 1 and 4:
This is the Matlab code:
pkg load 'statistics' % This only works like this in Octave
mu = [0,0]; %// data
sigma = [1 0; 0 4]; %// data
x = -5:.2:5; %// x axis
y = -5:.2:5; %// y axis
[X Y] = meshgrid(x,y); %// all combinations of x, y
Z = mvnpdf([X(:) Y(:)],mu,sigma); %// compute Gaussian pdf
Z = reshape(Z,size(X)); %// put into same size as X, Y
colormap(jet)
surf(X,Y,Z) %// ... or 3D plot
The density of an M-dimensional multivariate Gaussian is:
f(X|μ,Σ)=1(2π)M/2det(Σ)1/2exp[−12(X−μ)TΣ−1(X−μ)]
The important part of the formula is the exponent, (X−μ)TΣ−1(X−μ), which is a positive definite quadratic function. The part in front is just a normalizing factor.
A quadratic form in linear algebra is of the form xTAx, providing the formula for ellipsoids in higher dimensions. And Σ can be visualized as an “error elipsoid” around the mean.
Ellipsoids are of the form x2/a2+y2/b2+z2/c2=1:
This is Dr. Strang’s example of a 3×3 positive definite matrix. What he calls “the good matrix”:
A=[2−10−12−10−12]
Proving that it is positive definite through the xTAx rule…
xTAx=2x21+2x22+2x23−2x1x2−2x2x3>0
we could complete the square to proof that the inequality is true.
We are in four dimensions, being that we have a function. But if we cut through this thing at height one, we get an ellipsoid (a lopsided football) with its axes determined by the eigenvalues in the factorization QΛQT, where Q is the matrix of eigenvectors, and Λ the diagonal of the squared eigenvalues, λi≥0. Hence,
QΛQT=QΛ1/2Λ1/2QT=(QΛ1/2)(QΛ1/2)T=AAT
Any affine transformation f(x)=AX+b of a Gaussian is a Gaussian. If X∼N(μ,Σ), AX+b∼N(Aμ+b,AΣAT).
So if X1,X2,…,Xn∼N(0,1) iid, placing them in a vector, we get X∼(0,I), and Ax+μ∼N(μ,Σ), where Σ=AAT. This is a form of generating multivariate Gaussians.
“Sphering” turns a Gaussian into a “sphere” (multivariate standard) through an affine transformation. So it converts Gaussians back to standard multivariate normal.
Y∼N(μ,Σ)⟹A−1(Y−μ)∼N(0,I), where Σ=AAT.
Using equation (1) we can use express the A in Y=AX+μ∼N(μ,Σ) (blue equation) as A=QΛ1/2, giving Y=AX+μ=QΛ1/2X+μ. Now we can apply the red equation to Λ1/2X, rendering Λ1/2X+μ∼N(Λ1/2×0+0,Λ1/2IΛ1/2)=N(0,Λ). Λ is geometrically the degree of stretching of the distribution (the variance). When we multiply by Q (an orthogonal matrix) QΛ1/2 we end up with QΛ1/2∼N(0,Σ). An orthogonal matrix give a reflection or rotation.
So by applying an affine transformation to X∼N(0,I) we end up stretching and rotating. The μ centers the multivariate (shif).
If we have a multivariate Gaussian X=[X1,X2]∈R2 with two dimensions, the coordinates are also Gaussian (X1 and X2 are Gaussian).
Proof:
In general, we can decompose an n-dimension multivariate Gaussian matrix X∼N(μ,Σ) by getting the first k components, indexed as a=2,…,k. We’ll try to show that these first k components are Gaussian. The rest of the components are indexed by b=k+1,…,n. Now X can be expressed as a block vector:
X=[XaXb] with Xa=[X1⋮Xk] and Xb=[Xk+1⋮Xn].
We can decompose μ=[μaμb]
and Σ into the block matrix Σ=[ΣaaΣabΣbaΣbb]. This is a block matrix because Σaa=[σ2(X1)…cov(X1,Xk)⋮⋱⋮cov(Xk,X1)…σ2(Xk)]
The marginalization property states that Xa∼N(μa,Σaa) is multivariate normal. We can prove it using the affine property and using the projection matrix A (without blocks it is of the form [10] or [01]):
A=[10000…0000…001000…0000…000100…0000…000010…0000…000001…0000…0⋮⋮⋮⋮⋮⋱⋮⋮⋮⋮⋱⋮00000…1000…0]
which is (k×n).
And by construction,
AX=Xa which by the affine property (red equation):
Given that Aμ=μa (the projection of the means); and AΣAT=Σaa,
AX=Aa∼N(μa,Σaa).
The same can be done with Xb.
####CONDITIONAL DISTRIBUTION:
If X=(X1,X2)T∈R2⟹(X1|X2=x2) is Gaussian.
Using the same block matrices as above:
(Xa|Xb=xb)∼N(m,D) where m=μa+ΣabΣ−1bb(xb−μb) and D=Σaa−ΣabΣ−1bbΣba.
For a full derivation see here and here.
NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.