Tensors Linearize Multi-linear functions

Tensors linearize multi-linear functions:

Tensors are not multi-dimensional arrays. Further, the representation of a tensor as a matrix in the most simple examples is also misleading. Why? Because the idea of a tensor is to linearlize a multi-linear function.

For instance, a bilinear function (like any other multi-linear functions) is not linear. Take for example the area of a rectangle \(A = s \times l\): Doubling the side of a rectangle, doubles the area, and the same happens with the length; however, the function is bilinear, such that doubling the side and the length result in an area of the rectangle \(4\) times larger (here). The dot product is such another example: the total cost of some purchases is the dot product of the number of units of each item purchased by the cost of each unit. A dubling of the price of each item, and a concomitant doubling of the number of units purchased of each item will increase four-fold the overall

The dot product fulfills the criteria for a bi-variate function, which amount to being linear on the left:

\[\langle au + bv, w \rangle = a \langle u, w \rangle + b \langle v, w \rangle\]

and on the right:

\[\langle u, av + bw \rangle = a \langle u, v \rangle + b \langle u, w \rangle.\]

u = 1:3
v = 4:6
w = 7:9
a = 5
b = 8
# Linear on the left:
(a * u + b * v) %*% w

##      [,1]
## [1,] 1226

a * (u %*% w) + b * (v %*% w)

##      [,1]
## [1,] 1226

# Linear on the right
u %*% (a * v + b * w)

##      [,1]
## [1,]  560

a * (u %*% v) + b * (u %*% w)

##      [,1]
## [1,]  560

Linearity preserves scalar multiplication in both variables of a function \(f: X \times Y \to \mathbb R\):

\[ f(\alpha x , \alpha y) = \alpha f (x,y) \]

However, if \(f\) is bilinear

\[ f(\alpha x , \alpha y) = \alpha^2 f(x,y) \]

(a * v) %*% (a * w)

##      [,1]
## [1,] 3050

a^2 * (v %*% w)

##      [,1]
## [1,] 3050

The dot product can be linearized with tensors as in here:

Take \(V = \mathbb{R}^2\) and let’s consider the dot product as a bilinear map \(\cdot : V \times V \to \mathbb{R}\), as you suggested. If we write \(e_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, e_2 = \begin{bmatrix} 0 \\ 1 \end{bmatrix}\) for the standard basis of \(V\) then the dot product is given explicitly on this basis by

\[e_1 \cdot e_1 = e_2 \cdot e_2 = 1, e_1 \cdot e_2 = e_2 \cdot e_1 = 0.\]

or in short \(e_i \cdot e_j = \delta_{ij}\) where \(\delta_{ij}\) is the Kronecker delta. The idea of the tensor product is that this specification of what a bilinear map is doing on pairs of basis vectors is really “the same as” specifying what a corresponding linear map is doing on a vector space with basis given by the tensors \(e_1 \otimes e_1, e_1 \otimes e_2, e_2 \otimes e_1, e_2 \otimes e_2\). That is, the dot product corresponds to a linear map

\[D : V \otimes V \to \mathbb{R}\]

and this linear map does the following: \(V \otimes V\) has a basis of tensor products as mentioned above, and

\[D(e_1 \otimes e_1) = D(e_2 \otimes e_2) = 1, D(e_1 \otimes e_2) = D(e_2 \otimes e_1) = 0\]

The property of a multilinear function is

\[T(v_1, . . . ,\color{red}{ av_i + bv'_i }, . . . , v_r) =\color{red} a\, T(v_1, . . . , v_i, . . . , v_r) +\color{red} b \, T(v_1, . . . , v'_i, . . . , v_r\]

The determinant is a prime example of multilinearity:

\[\det\begin{bmatrix}&5&6&8&3&7\\ &8&7&8&8&2\\ &6&5&3&3&6\\ (5\times&\color{magenta}1&\color{magenta}9&\color{magenta}2&\color{magenta}3&\color{magenta}1\\ &&&\Large \color{red}+\\ 3\times&\color{orange}5&\color{orange}7&\color{orange}2&\color{orange}2&\color{orange}5)\\ &2&6&5&9&3 \end{bmatrix}=5\times\det\begin{bmatrix}5&6&8&3&7\\ 8&7&8&8&2\\ 6&5&3&3&6\\ \color{magenta}1&\color{magenta}9&\color{magenta}2&\color{magenta}3&\color{magenta}1\\ 2&6&5&9&3\end{bmatrix}+ 3\times\det\begin{bmatrix}5&6&8&3&7\\ 8&7&8&8&2\\ 6&5&3&3&6\\ \color{orange}5&\color{orange}7&\color{orange}2&\color{orange}2&\color{orange}5\\ 2&6&5&9&3 \end{bmatrix}\]

v = c(5L, 6L, 8L, 3L, 7L, 8L, 7L, 8L, 8L, 2L, 6L, 5L, 3L, 3L, 6L, 
+       1L, 9L, 2L, 3L, 1L, 5L, 7L, 2L, 2L, 5L, 2L, 6L, 5L, 9L, 3L)
m = matrix(v, 6, byrow=T)
# Multiplying the fourth row by 5 and the fifth row by 3 and adding them:
M =  rbind(m[c(1:3),], 5 * m[4,] + 3 *  m[5,], m[6,])
det(M)

## [1] 79799

# Extracting the scalars 5 ad 3 and splitting the matrix to add two det:
5 * det(m[c(1:4,6),]) + 3 * det(m[c(1:3,5:6),])

## [1] 79799

The wedge product is a determinant. For instance, the area subtended by the vectors \(v=[2,2], w=[1,3]\) can be calculated as the determinant or, equivalently, the 2-form \(\omega=1 dx \wedge dy\) acting on the vectors \(v,w\),

\[\omega(v,w)= \det\begin{bmatrix}2&2\\1&3\end{bmatrix}=4\]

which in tensor notation corresponds to \(\small e^1\otimes e^2 - e^2 \otimes e^1\), since the other two basis have a coefficient of zero \(\dots + 0 \, e^1 \otimes e^1 + 0\, e^2 \otimes e^2\) a tensor in \(V^{\ast} \otimes V^{\ast}\). This is the linearization of the determinant, a multi-linear map. The Leibniz formula linearizes the determinant. Here is the case of a \(3 \times 3\) matrix:

\[\begin{align}\det\begin{bmatrix}|&|&|\\v_1&v_2&v_3\\|&|&|\end{bmatrix}&=e^1\otimes e^2 \otimes e^3- e^1\otimes e^3 \otimes e^2 \\&+ e^3\otimes e^1 \otimes e^2 - e^2\otimes e^1 \otimes e^3\\\\ & +e^2\otimes e^3 \otimes e^1 - e^3\otimes e^2 \otimes e^1 \end{align}\]

A bi-linear form can be expressed as a matrix \(A\), and the function be interpreted as multiplying on the left by a row vector \(\vec v^\top\) and on the right by a column vector \(\vec w\) as in

\[\vec v^\top A \vec w\]

as such any linear operations on a row in \(A\) will necessarily alter all the columns: the bi-linearity of the function is manifest in this way. Naturally, it can still be claimed that the matrix represents a \((1,1)\) tensor, but this avoids the idea behind the tensor product: the linearization of the multi-linear function.

The linearization breaks down a bi-linear form (matrix) into individual \(e_i⊗e_j\) basis. In matrix form, it is impossible to change a row (dual) by linear algebra operations without affecting all the columns (hence bi-linear), but when each entry has its own tensored basis, we can dial (linear change) each one independent of the others by basic linear algebra operations - hence, it becomes linearlized.

The process of linearization includes getting the “free product” between vector spaces, and then obtaining the quotient by the bilinearity operations we want to preserve (see here).

From Wikipedia,

Let \(V\) and \(W\) be two vector spaces over a field \(F\). Let \(R\) be the linear subspace of \(L\) that is spanned by the relations that the tensor product must satisfy. More precisely, \(R\) is spanned by the elements of one of the forms:

\[\begin{align} (v_1 + v_2, w) - (v_1,w) - (v_2,w)\\ (v,w_1+w_2) - (v,w_1) - (v,w_2)\\ (sv,w) - s(v,w)\\ (v,sw) - s(v,w) \end{align}\]

where \(v_1,v_2,v_3 \in V\) and \(w_1,w_2,w_3 \in W\) and \(s \in F\).

Then the tensor product is defined as the quotient space:

\(V \otimes W = L/R\).

Home Page

NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.

Tensors Linearize Multi-linear functions

NOTES ON STATISTICS, PROBABILITY and MATHEMATICS

Tensors linearize multi-linear functions: