NOTES ON STATISTICS, PROBABILITY and MATHEMATICS


Motivation for Tensors:


From this post:

Examples of multilinear maps:

  1. If \(V\) is an inner product space over \(\mathbb{R},\) then the inner product is bilinear.
  2. Matrix multiplication as an operation.
  3. The determinant of a matrix is a multilinear map if we view the columns of the matrix as vector arguments.
  4. The cross product of vectors in \(\mathbb{R}^3\) is bilinear.

Comparison of multilinear maps to linear maps:

Given two vector spaces \(V\) and \(W\), the direct sum is the Cartesian product \[V \times W\] inbued with a linear algebraic structure. We have the set \(X\oplus W =\{(v_i,w_i)\}\) of ordered pairs, and we define

Addition:

\[(v_1,w_1) + (v_2,w_2)=(v_1 + v_2, w_1 + w_2)\] Scalar multiplication:

\[\lambda(v_i,w_i)=(\lambda\,v_i\quad,\quad\lambda \,w_i)\]

Instead, the TENSOR PRODUCT SPACE (TPS) \(V\otimes W\) also departs from the set of Cartesian products of individual vector spaces, renaming the pais \((v_i,w_i)\) to \(v_i\otimes w_i.\) BUT in the TPS,

addition is only defined when one of the two components is identical:

\[(v_1\;\otimes\;\color{red}{ w_1}) + (v_2\;\otimes\;\color{red}{w_1}) = (v_1 + v_2\quad\otimes\quad\color{red}{w_1})\]

and scalar multiplication is defined to be the same applied to any component:

\[\lambda\,(v_i \otimes w_i) = \lambda\,v_i \otimes w_i=v_i \otimes \lambda w_i\]

In general, multilinearity is characterized by

\[T\left(v_1 , v_2 , v_3 , v_i + v_j , \cdots, v_n\right) = T\left(v_1 , v_2 , v_3 , v_i, \cdots , v_n\right)+T\left(v_1 , v_2 , v_3 , v_j, \cdots , v_n\right)\]

and

\[T\left(v_1 , v_2 , k\,v_3 , \cdots , v_n\right)=k\,T\left(v_1 , v_2 , v_3 , \cdots , v_n\right)\]

with \(T\) standing for linear transformation.


Tensor product of two copies of \(\mathbb R:\)

This is the TPS \(\mathbb R \otimes \mathbb R,\) comprising vectors such as \(3\otimes 5.\)

\[3\otimes 5 \quad+\quad 1\otimes (-5)\quad=\quad 3\otimes \color{red}5 \quad + \quad (-1)\otimes \color{red} 5 \quad =\quad 2 \otimes \color{red}5\]

also

\[6\otimes 1 \quad+\quad 3\pi\otimes\pi\quad = \quad \underbrace{\color{red}3\otimes 2}_{2\times 3\otimes1=3\otimes 2\times 1}\quad + \quad \color{red}3\otimes \pi^2\quad=\quad \color{red}3 \otimes(2 +\pi^2)\]


Dot products:

\[\text{dot}\left( \begin{bmatrix}1\\2\\3 \end{bmatrix} , \begin{bmatrix} 1\\0\\1\end{bmatrix}\right)=1\times 1 + 2\times 0+3\times 1 =4\]

\[\begin{align} \text{dot}\left(5 \begin{bmatrix}1\\2\\3 \end{bmatrix} , \begin{bmatrix} 1\\0\\1\end{bmatrix}\right)&=5\;\text{dot}\left( \begin{bmatrix}1\\2\\3 \end{bmatrix} , \begin{bmatrix} 1\\0\\1\end{bmatrix}\right)\\[2ex] &=5\times1\times 1 + 5\times 2\times 0+5\times3\times 1\\[2ex] &=5\,\left(1\times 1 + 2\times 0+3\times 1 \right)=20 \end{align}\]

and

\[\begin{align} \text{dot}\left( \begin{bmatrix}1\\2\\3 \end{bmatrix} , \begin{bmatrix} 1\\0\\1\end{bmatrix}+\begin{bmatrix} 2\\1\\1\end{bmatrix}\right)&=\text{dot}\left( \begin{bmatrix}1\\2\\3 \end{bmatrix} , \begin{bmatrix} 1\\0\\1\end{bmatrix}\right)+\text{dot}\left( \begin{bmatrix}1\\2\\3 \end{bmatrix} , \begin{bmatrix} 2\\1\\1\end{bmatrix}\right)\\[2ex] 11&=4+7 \end{align}\]


Matrices:

Considering matrices as vectors, and \(m(\cdot)\) representing the matrix product operator, i.e. given matrices \(A\) and \(B,\) the product \(m(A,B) = AB,\)

\[\begin{align} m(A +B\;,\;C+D)\quad&\neq\quad m(A,C) \quad+\quad m(B,D)\\[2ex] (A+B)(C+D)\quad &\neq \quad AC + BD \end{align}\]

but if we fix the second coordinate,

\[m(A +C\;,\; B) \quad= \quad(A+C)B\quad=\quad AB + CB \quad =\quad m(A,B) \quad+\quad m(C,B)\]

Here is an obvious numeric example:

\[\left( \begin{bmatrix} 6&2\\0&-1 \end{bmatrix} + \begin{bmatrix} 4&2\\0&-5 \end{bmatrix}\right)\,\color{red}{\begin{bmatrix} 1&2\\3&-1 \end{bmatrix}}=\left( \begin{bmatrix} 6&2\\0&-1 \end{bmatrix} \, \color{red}{\begin{bmatrix} 1&2\\3&-1 \end{bmatrix}}\right)+\left( \begin{bmatrix} 4&2\\0&-5 \end{bmatrix} \, \color{red}{\begin{bmatrix} 1&2\\3&-1 \end{bmatrix}}\right)\]

A = matrix(c(6,0,2,-1),2,2)
B = matrix(c(4,0,2,-5),2,2)
C = matrix(c(1,3,2,-1),2,2)
D = matrix(c(2,0,-2,5),2,2)

all.equal((A + C) %*% B        ,  (A %*% B) + (C %*% B))
## [1] TRUE

but

\[\left( \begin{bmatrix} 6&2\\0&-1 \end{bmatrix} + \begin{bmatrix} 4&2\\0&-5 \end{bmatrix}\right)\,\left(\begin{bmatrix} 1&2\\3&-1 \end{bmatrix}+\begin{bmatrix}2&-2\\0&5\end{bmatrix}\right) \neq \left( \begin{bmatrix} 6&2\\0&-1 \end{bmatrix} \,\begin{bmatrix} 1&2\\3&-1 \end{bmatrix}\right)+\left( \begin{bmatrix} 4&2\\0&-5 \end{bmatrix} \, \begin{bmatrix} 2&-2\\0&5 \end{bmatrix}\right)\]

all.equal((A + B) %*% (C + D)  ,  (A %*% C) + (B %*% D))
## [1] "Mean relative difference: 0.5394737"

so \(m(\cdot)\) as defined is more akin to a multilinear map than a linear map from a vector space \(V\) to a vector space \(W\): \(\varphi: V \to W\) as in a change of coordinates, where

\[\begin{bmatrix}a_{11}& a_{12} & a_{13}\\ a_{21}& a_{22} & a_{23}\\ a_{31}& a_{32} & a_{33}\\ a_{41}& a_{42} & a_{43} \end{bmatrix} \begin{bmatrix}v_1^1\\v_1^2\\v_1^3 \end{bmatrix}=\begin{bmatrix}w_1^1\\w_1^2\\w_1^3\\w_1^4\end{bmatrix} \]

in which case

\[\varphi(v_1,v_2)+\varphi(v_3, v_4) = \left(\varphi(v_1 + v_3),\varphi(v_2+v_4)\right)\]

Here is the idea encoded:

set.seed(0)
m = matrix(rnorm(12),4,3)
v1 = 1:3
v2 = 3:5
v3 = 5:7
v4 = 7:9
varphi_v1 = m %*% v1
varphi_v2 = m %*% v2
varphi_v3 = m %*% v3
varphi_v4 = m %*% v4

sum_after_lin_map = varphi_v1 + varphi_v3
lin_map_after_sum = varphi_v1_plus_v3 = m %*% (v1 + v3)
all.equal(sum_after_lin_map, lin_map_after_sum)
## [1] TRUE
sum_after_lin_map_2 = varphi_v2 + varphi_v4
lin_map_after_sum_2 = varphi_v2_plus_v4 = m %*% (v2 + v4)
all.equal(sum_after_lin_map_2, lin_map_after_sum_2)
## [1] TRUE

Same goes for scalar multiplication:

\[m(\alpha A,B) = \alpha AB = A(\alpha B) =m(A,\alpha B)= \alpha m (A,B)\]

alpha = sample(2:10,1)
all.equal((alpha * A) %*% B, alpha * (A %*% B), A%*%(alpha * B), A %*% (alpha * B), alpha * (A%*%B))
## [1] TRUE

whereas in the case of vectors:

\[\lambda \,\varphi(v_1,v_2)+\lambda\;\varphi(v_3, v_4) = \left(\lambda\;\varphi(v_1 + v_3)\quad,\quad\lambda\;\varphi(v_2+v_4)\right)\]

lambda = alpha
lambda_times_sum_after_lin_map = lambda * varphi_v1 + lambda * varphi_v3
lambda_times_lin_map_after_sum = varphi_v1_plus_v3 = lambda * ( m %*% (v1 + v3) )
all.equal(lambda_times_sum_after_lin_map, lambda_times_lin_map_after_sum)
## [1] TRUE

CONCLUSION:

Pairings of matrices (simbolized by \((\cdot, \cdot)\)) above, but clearly hinting at \((\cdot \otimes \cdot)\) don’t show a linear behavior when subjected to a function such as matrix multiplication. So, although we can pair (Cartesian product) matrix to create a vector space, it will not show the same linear properties as with a typical “arrow” vector space. The properties for a vector space formed by pairing matrices will behave like a TPS with bilinear, or more generally, multilinear features.


Determinants:

\[\det\begin{bmatrix} \vert& \vert & \vert\\ v_1 & v_2 & v_3\\ \vert & \vert & \vert \end{bmatrix}+\det\begin{bmatrix} \vert& \vert & \vert\\ v_1 & v_2 & v_4\\ \vert & \vert & \vert \end{bmatrix} = \det\begin{bmatrix} \vert& \vert & \vert\\ v_1 & v_2 & (v_3 + v_4)\\ \vert & \vert & \vert \end{bmatrix} \]

v1 = 1:3; v2 = 3:5; v3 = 5:7; v4 = c(0,1,-2)
mat.1 = cbind(v1,v2,v3); mat.2 = cbind(v1,v2,v4)
all.equal(det(mat.1) + det(mat.2) ,  det(cbind(v1, v2, v3 + v4)))
## [1] TRUE

and

\[k \;\det\begin{bmatrix} \vert& \vert & \vert\\ v_1 & v_2 & v_3\\ \vert & \vert & \vert \end{bmatrix}=\det\begin{bmatrix} \vert& \vert & \vert\\ v_1 &k\; v_2 & v_3\\ \vert & \vert & \vert \end{bmatrix}\]

all.equal(5 * det(mat.1), det(cbind(v1, 5 * v2, v3)))
## [1] TRUE

Consider the determinant of a matrix as an alternating (changing signs) multilinear operator. Matrices have to be square to have a determinant, and in the case of a matrix in \(\mathbb R^3\) formed by two linearly independent column vectors, we’ll need to add an additional vector. For example:

\(\vec v = \begin{bmatrix}1\\2\\3\end{bmatrix}\) and \(\vec w = \begin{bmatrix}-1\\7\\-2 \end{bmatrix}\) can form a matrix with an element of the standard basis added in the third column resulting in the determinant:

\[\det\begin{bmatrix} 1 &-1 &1\\\bbox[5px,border:2px solid red]{2}&\bbox[5px,border:2px solid red]7&0\\\bbox[5px,border:2px solid yellow]3&\bbox[5px,border:2px solid yellow]{-2}&0\end{bmatrix}=\begin{vmatrix}2&7\\3&-2\end{vmatrix}=-25\] while using \(\vec e_2\) in the last column would result in:

\[\det\begin{bmatrix} \bbox[5px,border:2px solid red]1 &\bbox[5px,border:2px solid red]{-1} &0\\2&7&1\\\bbox[5px,border:2px solid yellow]3&\bbox[5px,border:2px solid yellow]{-2}&0\end{bmatrix}=\begin{vmatrix}1&-1\\3&-2\end{vmatrix}=-1\]

We are really doing determinants of \(2 \times 2\) determinants by choosing rows of the matrix

\[\begin{bmatrix}\vert & \vert \\ \vec v & \vec w \\\vert&\vert \end{bmatrix}\]

in this case we are choosing \(3 \choose 2\) combinations of rows.

Notationally, we could select these rows using the differential form \(\text{d}x_i\) mapping or function, which selects the \(i\)-th entry of a vector, but extending it to a row by using two indices. So, in general \(\phi_i = \text{d}x_i\) applied to the vector \(\vec v\) would produce \(\text{d}x_i(\vec v)=v_i =\vec v\cdot \vec e_i.\) We could use the notation \(\text{d}(\vec x_{12}(\vec v, \vec w)\) to indicate the determinant of the first and second rows in that order.

If the vectors \(\vec v\) and \(\vec w\) determine a parallelogram, the result of the function described would correspond to the signed projection of this parallelogram on combinations of \((x,y,z)\) coordinate planes:

\(\Lambda^k\left(\mathbb R^n \right)^*\) could denote the vector space of all alternating multilinear functions of \(k\) vectors in \(\mathbb R^n:\)

\[\underbrace{\mathbb R^n \times \cdots \times \mathbb R^n}_k \to \mathbb R.\]

In this case \(n = 3,\) and \(k =2.\)

If \(k=1,\) we end up with \(\left( R^n\right)^*:\) The collection of linear maps \(\mathbb R^n \to \mathbb R.\) These are of the form

\[T(\vec x) = \vec a \cdot \vec x\]

for some \(\vec a \in \mathbb R^n.\)

This vector space (the dual) has basis

\[\{\;\phi_1(\vec x) = \vec x \cdot \vec e_1\quad,\quad \phi_2(\vec x) = \vec x \cdot \vec e_2 \quad,\quad \dots\quad,\quad \phi_n(\vec x) = \vec x \cdot \vec e_n\;\}.\]

If \(k = n,\) we end up with the determinant of a \(3\times 3\) matrix. This is a \(1\)-dim vector space with basis \(\text{d}\vec x_{123\dots n} = \text{determinant}.\)

We can define \(\text{d}\vec x_{i_1,i_2,\dots,i_k}\left( \vec v_1, \vec v_2, \dots, \vec v_k \right)=\begin{vmatrix}v_{1_{i_1}} & v_{2_{i_1}} & \cdots & v_{k_{i_1}} \\ v_{2_{i_2}} & v_{2_{i_2}}&\cdots & v_{k_{i_2}}\\ &\vdots&\\ v_{1_{i_k}} & v_{2_{i_k}}& \cdots & v_{k_{i_k}}\end{vmatrix}.\)

For \(\Lambda^2\left( \mathbb R^3\right)^*\) we can get the maps \(\text{d}\vec v_{ij}\) with \(i<j,\) i.e. increasing indices: \(\text{d}\vec v_{12}, \text{d}\vec v_{13}, \text{d}\vec v_{23}\) - given that \(\text{d}\vec v_{12}=- \text{d}\vec v_{21}.\) In general we have \(n \choose k.\)


Home Page

NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.