From this article:
Vector is a physical quantity and it does not depend on any co-ordinate system. It need to be expanded in some basis for practical calculation and its components do depend on the chosen basis. The expansion in orthonormal basis is mathematically simple. But in many physical situations we have to choose an non-orthogonal basis (or oblique co-ordinate system). But the expansion of a vector in non-orthogonal basis is not convenient to work with. With the notion of contravariant and covariant components of a vector, we make non-orthogonal basis to behave like orthonormal basis.
We introduce \(\vec a = e_1, \; \vec b=e_2,\; \vec c=e_3\) for contravariant basis and \(\vec a' = e^1, \; \vec b'=e^2,\; \vec c'=e^3\) for covariant basis. With this notation equation:
\[\vec a\cdot \vec a' = \vec b\cdot \vec b'=\vec c\cdot \vec c'=1;\; \vec a\cdot \vec b'=\vec a\cdot \vec c'=0;\;\vec b\cdot \vec a'=\vec b\cdot \vec c'=0;\;\vec c\cdot \vec a'=\vec c\cdot \vec b'=0\]
becomes \(I = e_\mu e^\mu\tag {23}\)
and equation
\[I = \vec a \vec a' + \vec b \vec b' + \vec c \vec c'\]
becomes \(e_i\cdot e^j =\delta_i^j\tag {24}\)
where summation over dummy indices is understood. \(\delta_i^j\)is standard Kronecker delta function. With the introduction of superscript and subscript notation we generalise equation (23) and equation (24) to n-dimensional Euclidean space. The contravariant component of any arbitrary vector \(\vec A\) is \(A^i\) with superscript index and covariant component is \(A_i\) with subscript index are taken to be understood. The dimension of contravariant vector is the inverse of the covariant vector and hence we expect the behaviour of contravariant vector and covariant vector under co-ordinate transformation inverse to each other.
KEY point: In a Cartesian system, covariant and contravariant components are the same.
From this series on youtube:
Imagining a differential displacement vector in two different coordinate systems, \(X\) and \(Y:\)
What follows is prdicated on the assumption that we know the equations relating each component (\(m\)) in the \(X\) coordinate system to the \(Y\) coordinate frame:
\(Y^n = f(X^m)\) and \(X^p = g(Y^z).\)
The change in coordinates of the differential displacement vector knowing the transformation equations is given by:
\(\begin{align} dy^1 &= \frac{\partial y^1}{\partial x^1} dx^1 + \frac{\partial y^1}{\partial x^2} dx^2 + \frac{\partial y^1}{\partial x^3} dx^3 + \cdots + \frac{\partial y^1}{\partial x^n} dx^n\\ dy^2 &= \frac{\partial y^2}{\partial x^1} dx^1 + \frac{\partial y^2}{\partial x^2} dx^2 + \frac{\partial y^2}{\partial x^3} dx^3+\cdots+\frac{\partial y^2}{\partial x^n} dx^n\\ \vdots\\ dy^d &= \frac{\partial y^d}{\partial x^1} dx^1 + \frac{\partial y^d}{\partial x^2} dx^2 + \frac{\partial y^d}{\partial x^3} dx^3+\cdots+\frac{\partial y^d}{\partial x^n} dx^n \end{align}\)
So any particular component in the new coordinate system would be of the form:
\[dy^n = \frac{\partial y^n}{\partial x^\color{blue}{m}} dx^{\color{blue}{m}} \tag{Ref.1}\]
with the color coding indicating Einstein’s convention.
Expressed in matrix form:
\[\begin{bmatrix} dy^1\\dy^2\\dy^3\\\vdots\\dy^d \end{bmatrix}= {\begin{bmatrix} \frac{\partial y^1}{\partial x^1} & \frac{\partial y^1}{\partial x^2} & \frac{\partial y^1}{\partial x^3} &\cdots& \frac{\partial y^1}{\partial x^n}\\ \frac{\partial y^2}{\partial x^1} & \frac{\partial y^2}{\partial x^2} & \frac{\partial y^3}{\partial x^3} &\cdots& \frac{\partial y^n}{\partial x^n}\\ \vdots&\vdots&\vdots&&\vdots\\ \frac{\partial y^d}{\partial x^1} & \frac{\partial y^d}{\partial x^2} & \frac{\partial y^d}{\partial x^3} &\cdots& \frac{\partial y^d}{\partial x^n}\\ \end{bmatrix}} \large\color{red}{\begin{bmatrix} dx^1\\dx^2\\dx^3\\\vdots\\dx^n \end{bmatrix}} \]
We can generalize to a vector \(V\) (column vector in red), which can be transfored from \(X\) to \(Y\) coordinate systems as:
\[\bbox[yellow, 5px]{V^n_{(Y)} = \frac{\partial y^{n}}{\partial x^{\color{red}{m}}}\;V^{\color{red}{m}}_{(X)}}\]
Notice that in this case \(n = d\) (the \(d\) in the matrix above), while \(m\) is a dummy index, but it is in this case equal to \(n\). So \(n = d\) is the dimension of the vector in \(Y\), or the number of rows of the transformation matrix; and \(m\) is the number of columns, or the dimension of the vector in the \(X\) coordinate system.
If we can find the vector component \(n\) in \(Y\) of \(V\) by taking these types of derivatives, we talk about a contravariant vector. The components are expressed as a superscript.
Now let’s take two contravariant vectors: \(A_Y^m\) (\(m\)-th component of the \(A\) vector in the \(Y\) frame): \(\large A_{(Y)}^m= \frac{\partial y^m}{\partial x^r} A_{(X)}^r\); and the second vector \(\large B_{(Y)}^n= \frac{\partial y^n}{\partial x^s} B_{(X)}^s.\)
If we multiply these two vectors together:
\[\large A^m_{(Y)} B^n_{(Y)}\]
we have to take the \(d\) number of components that \(A\) has and modify each one by the \(d\) number of components \(B\) has, expressing it as:
\[\large \bbox[10px, border:2px solid red]{T^{mn}_{(Y)} = \large A^m_{(Y)} B^n_{(Y)} =\frac{\partial y^m}{\partial x^r} \; \frac{\partial y^n}{\partial x^s}\; A_{(X)}^r\; B_{(X)}^s= \frac{\partial y^m}{\partial x^\color{blue}{r}} \; \frac{\partial y^n}{\partial x^\color{blue}{s}}\;T^{\color{blue}{rs}}_{(X)}}.\]
These are contravariant tensors of the second rank. The \(m,n,r,s\) superscript are the vector components (elements or entries), while \((X),(Y)\) are coordinate systems. So we note that tensors enter when there is a transformation between coordinate systems of more than one vector. This is consistent with the Wikipedia entries both of vectors as multilinear maps:
A downside to the definition of a tensor using the multidimensional array approach is that it is not apparent from the definition that the defined object is indeed basis independent, as is expected from an intrinsically geometric object. Although it is possible to show that transformation laws indeed ensure independence from the basis, sometimes a more intrinsic definition is preferred. One approach is to define a tensor as a multilinear map. In that approach a type \((p, q)\) tensor \(T\) is defined as a map,
\[T:\underbrace{V^{*}\times \dots \times V^{*}}_{p{\text{ copies}}}\times \underbrace{V\times \dots \times V}_{q{\text{ copies}}}\rightarrow \mathbf {R}\]
where \(V\) is a (finite-dimensional) vector space and \(V^∗\) is the corresponding dual space of covectors, which is linear in each of its arguments.
By applying a multilinear map \(T\) of type \((p, q)\) to a basis \(\{e_j\}\) for \(V\) and a canonical cobasis \(\{\epsilon^i\}\) for \(V^∗\),
\[T_{j_{1}\dots j_{q}}^{i_{1}\dots i_{p}}\equiv T(\mathbf{\varepsilon }^{i_{1}},\ldots ,\mathbf {\varepsilon }^{i_{p}},\mathbf{e}_{j_{1}},\ldots ,\mathbf{e}_{j_{q}})\]
a \((p + q)\)-dimensional array of components can be obtained.
However, the most fitting definition is as multidimensional arrays:
Just as a vector in an n-dimensional space is represented by a one-dimensional array of length n with respect to a given basis, any tensor with respect to a basis is represented by a multidimensional array. For example, a linear operator is represented in a basis as a two-dimensional square n × n array. The numbers in the multidimensional array are known as the scalar components of the tensor or simply its components. They are denoted by indices giving their position in the array, as subscripts and superscripts, following the symbolic name of the tensor. For example, the components of an order \(2\) tensor \(T\) could be denoted \(T_{ij}\), where \(i\) and \(j\) are indices running from \(1\) to \(n\), or also by \(T_i^j\). Whether an index is displayed as a superscript or subscript depends on the transformation properties of the tensor, described below. The total number of indices required to identify each component uniquely is equal to the dimension of the array, and is called the order, degree or rank of the tensor. However, the term “rank” generally has another meaning in the context of matrices and tensors.
Just as the components of a vector change when we change the basis of the vector space, the components of a tensor also change under such a transformation. Each tensor comes equipped with a transformation law that details how the components of the tensor respond to a change of basis. The components of a vector can respond in two distinct ways to a change of basis (see covariance and contravariance of vectors), where the new basis vectors \(\displaystyle \mathbf {\hat {e}} _{i}\) are expressed in terms of the old basis vectors \(\displaystyle \mathbf {e} _{j}\) as,
\[\displaystyle \mathbf {\hat {e}}_{i}=\sum_{j=1}^{n}\mathbf {e}_{j}R_{i}^{j}=\mathbf {e} _{j}R_{i}^{j}.\]
Here \(R^j_i\) are the entries of the change of basis matrix, and in the rightmost expression the summation sign was suppressed: this is the Einstein summation convention. The components \(v^i\) of a column vector \(v\) transform with the inverse of the matrix \(R\),
\[\displaystyle {\hat {v}}^{i}=(R^{-1})_{j}^{i}v^{j},\]
where the hat denotes the components in the new basis. This is called a contravariant transformation law, because the vector transforms by the inverse of the change of basis. In contrast, the components, \(w_i\), of a covector (or row vector), \(w\) transform with the matrix \(R\) itself,
\[\displaystyle {\hat {w}}_{i}=w_{j}R_{i}^{j}.\]
This is called a covariant transformation law, because the covector transforms by the same matrix as the change of basis matrix. The components of a more general tensor transform by some combination of covariant and contravariant transformations, with one transformation law for each index. If the transformation matrix of an index is the inverse matrix of the basis transformation, then the index is called contravariant and is traditionally denoted with an upper index (superscript). If the transformation matrix of an index is the basis transformation itself, then the index is called covariant and is denoted with a lower index (subscript).
To move on to covariant tensors it is necessary to discuss what a gradient vector is:
So if a scalar \(\varphi\) is a function of \(X^1\) and \(X^2\), and we see a differential displacement \(\vec{dl}\), the change in \(\varphi\) will be given by:
\[\underset{\color{red}{\text{SCALAR}}}{\underbrace{\Huge{d\varphi}}} =\underset{\text{grad. vec.}}{\underbrace{\frac{\partial \varphi}{\partial x^1}}}\,dx^1 + \underset{\text{grad. vec.}}{\underbrace{\frac{\partial \varphi}{\partial x^2}}}\,dx^2\tag 1\]
KEY POINT: The gradient vector is in the dual space, taking in a “regular” vector and producing a scalar. In the case of the contravariant vector, a vector in a coordinate frame was transformed into another vector in a different frame.
We also have that
\[\vec{dl}= dx^1 \vec{X^1} + dx^2 \vec{X^2}\tag 2\]
with \(\vec{X^1}\) and \(\vec{X^2}\) representing the unit vectors.
We want a vector that dotted with equation \((2)\) results in equation \((1).\) Keeping in mind that \(\vec{X^1}\) and \(\vec{X^2}\) are unit vectors, the vector we are looking for is the gradient of the scalar \(\varphi\):
\[\vec \nabla \varphi= \frac{\partial \varphi}{\partial x^1}\,\vec{X^1} + \frac{\partial \varphi}{\partial x^2}\,\vec{X^2}\tag 3\]
Here’s the dot product:
\[d\varphi=\vec{dl}\,\vec{\nabla}\varphi=\color{brown}{ \begin{bmatrix}dx^1 \vec{X^1} & dx^2 \vec{X^2} \end{bmatrix} \begin{bmatrix} \frac{\partial \varphi}{\partial x^1}\,\vec{X^1} \\ \frac{\partial \varphi}{\partial x^2}\,\vec{X^2} \end{bmatrix}}=\frac{\partial \varphi}{\partial x^1}\,dx^1 + \frac{\partial \varphi}{\partial x^2}\,dx^2\]
So,
\[d\varphi = \vec{dl}\,\vec{\nabla}\varphi\]
Generalizing equation \((3)\),
\[\vec\nabla\varphi=\underset{coord. comp. grad. vec.}{\underbrace{\Large{\frac{\partial\varphi}{\partial x^\color{blue}{m}}}}}\;\vec{X^\color{blue}{m}}\]
is the expression of the gradient in the \(X\) coordinate frame. In the \(Y\) coordinate frame it would be:
\[\vec\nabla\varphi=\Large{\frac{\partial\varphi}{\partial y^\color{blue}{n}}}\;\vec{Y^\color{blue}{n}}\]
Applying the chain rule:
\[\color{red}{\frac{\partial \varphi}{\partial y^n}}= \frac{\partial \varphi}{\partial x^m} \frac{\partial x^m}{\partial y^n}=\frac{\partial x^m}{\partial y^n}\color{red}{\frac{\partial \varphi}{\partial x^m}}\]
This last equation relates the components of the gradient vector in the \(X\) coordinate frame to the components in the \(Y\) frame.
Notice that the arrangement of the dummy indices is:
\[\frac{\partial \varphi}{\partial y^n}= \frac{\partial x^{\color{red}{m}}}{\partial y^n}\frac{\partial \varphi}{\partial x^{\color{red}{m}}}\]
In matrix form:
\[\begin{bmatrix} \frac{\partial \varphi}{\partial y^1}\\\frac{\partial \varphi}{\partial y^2}\\\frac{\partial \varphi}{\partial y^3}\\\vdots\\\frac{\partial \varphi}{\partial y^d} \end{bmatrix}= {\begin{bmatrix} \frac{\partial x^1}{\partial y^1} & \frac{\partial x^2}{\partial y^1} & \frac{\partial x^3}{\partial y^1} &\cdots& \frac{\partial x^n}{\partial y^1}\\ \frac{\partial x^1}{\partial y^2} & \frac{\partial x^2}{\partial y^2} & \frac{\partial x^3}{\partial y^2} &\cdots& \frac{\partial x^n}{\partial y^2}\\ \vdots&\vdots&\vdots&&\vdots\\ \frac{\partial x^1}{\partial y^d} & \frac{\partial x^2}{\partial y^d} & \frac{\partial x^3}{\partial y^d} &\cdots& \frac{\partial x^n}{\partial y^d}\\ \end{bmatrix}} \large\color{red}{\begin{bmatrix} \frac{\partial \varphi}{\partial x^1}\\\frac{\partial \varphi}{\partial x^2}\\\frac{\partial \varphi}{\partial x^3}\\\vdots\\\frac{\partial \varphi}{\partial x^n} \end{bmatrix}} \]
This arrangement (red column vector - a gradient vector in coordinate system \(X\)) is the form that defines covariant vectors - for example \(W:\)
\[\bbox[yellow, 5px]{W^{(Y)}_n = \frac{\partial x^{\color{red}{m}}}{\partial y^n}\, W^{(X)}_{\color{red}{m}}}\]
Their components transform from one to another coordinate system like gradient vectors do. The components are subscripts!
Let´s say we have two covariant vectors \(A\) and \(B\) with \(d\) components:
\[C_m^{(y)}=\frac{\partial x^r}{\partial y^m} C_r^{(x)}\]
\[D_n^{(y)}=\frac{\partial x^s}{\partial y^n} D_s^{(x)}\]
Multiplying them,
\[C_m^{(y)}D_n^{(y)}=\frac{\partial x^r}{\partial y^m}{(y)}\frac{\partial x^s}{\partial y^n}C_r^{(x)}D_s^{(x)}\]
\[\Large \bbox[10px, border:2px solid red]{T_{mn}^{\small(Y)}= \frac{\partial x^{\color{blue}{r}}}{\partial y^m}\frac{\partial x^{\color{blue}{s}}}{\partial y^n}T_{\color{blue}{rs}}^{(x)}}\]
This is a covariant tensor!
There are mixed tensors, such as:
\[\Large T^n_m{\small (Y)} =\frac{\partial x^{\color{red}{r}}}{\partial y^m}\frac{\partial y^n}{\partial x^{\color{blue}{s}}}T^{\color{blue}{s}}_{\color{red}{r}}\small (X)\]
In a generalized curvilinear coordinate system, the three lines in the diagram can represent the magnitude or position of spherical coordinates:
with the two angles involved (in \(3\) dimensional space) assigned to the other two curvilinar coordinates - say \(u_1\) represents the magnitude in spherical; \(u_2\) is for the \(\theta\) angle; and \(u_3\) stands for \(\phi.\)
The \(\varepsilon_i\) vectors are tangent to the different coordinate axes; while the \(\varepsilon^i\) vectors are perpendicular to the coordinates.
The assumption is that we know how to transform from Cartesian to curvilinear, just like with spherical coordinates (trigonometric formulas).
So we would have functions of the type:
\[u^1 = f(x^1,x^2,x^3)\] \[u^2 = g(x^1,x^2,x^3)\] \[u^3 = h(x^1,x^2,x^3)\]
and
\[x^1 = \hat f (u^1,u^2,u^3)\] \[x^2 = \hat g (u^1,u^2,u^3)\] \[x^3 = \hat h(u^1,u^2,u^3)\]
Therefore, the position vector \(\vec r\) can be described as:
\[\vec r = x^1 \, \vec {\hat e_1} + x^2 \vec {\hat e_2} + x^3 \vec {\hat e_3}\]
which is equivalent to
\[\vec r = \hat f (u^1,u^2,u^3) \, \vec {\hat e_1} + \hat g (u^1,u^2,u^3) \vec {\hat e_2} + \hat h (u^1,u^2,u^3) \vec {\hat e_3}\]
and we can write the displacement of \(\vec r:\)
\[d\vec r = \frac{\partial \vec r}{\partial u^1} du^1+\frac{\partial \vec r}{\partial u^2}du^2+ \frac{\partial \vec r}{\partial u^3}du^3\tag 4\]
Here is a key observation: the partial derivatives with respect to \(u^i\) are the \(\varepsilon_i\) tangent vectors:
\[\bbox[10px, border:2px solid aqua]{ \varepsilon_i =\frac{\partial \vec r}{\partial u^i}}\tag 5\]
These are not unit vectors, but can be normalized dividing by the norm:
\[\lVert \frac{\partial \vec r}{\partial u^i}\rVert\]
This fact make it very easy to express a vector \(\vec A\) as the tangent vectors multiplied by a scalar:
\[\vec A = a(1)\, \varepsilon_1 + a(2) \, \varepsilon_2 + a(3) \, \varepsilon_3\]
In practice this is written as:
\[\vec A = a^1\, \varepsilon_1 + a^2 \, \varepsilon_2 + a^3 \, \varepsilon_3\tag 6\]
because if we were to transform to another curvilinear frame it would transform as a contravariant vector.
For instance, let’s transform from curvilinear frame \(\vec r(u^1, u^2, u^3)\) to \(\vec r(u^{'1}, u^{'2}, u^{'3}),\) with \(\vec r\) corresponding to the positional vector:
So in the \(U\) system, \(\large \vec \varepsilon_i=\frac{\partial \vec r}{\partial u^i}\); while for the \(U'\) system, \(\large \vec \varepsilon_i=\frac{\partial \vec r}{\partial u^{'i}}.\) They transform as:
\[\color{fuchsia}{\large \vec \varepsilon_j} = \frac{\partial \vec r}{\partial u^j}= \frac{\partial u^{'i}}{\partial u^j} \frac{\partial \vec r}{\partial u^{'i}}= \color{fuchsia}{\frac{\partial u^{'i}}{\partial u^j} \vec \varepsilon'_i}\]
Here is the framed result with it inverse:
\[\bbox[10px,border:2px solid fuchsia]{\large \vec \varepsilon_j=\frac{\partial u^{'i}}{\partial u^j} \vec \varepsilon'_i}\tag {ref.2}\]
\[\bbox[10px,border:2px solid purple]{\large \vec \varepsilon'_i = \frac{\partial u^{j}}{\partial u^{'i}} \vec \varepsilon_j}\tag {ref.3}\]
The vector stays the same:
\[\large \vec A = a^{'i}\vec \varepsilon'_i = a^j \color{fuchsia}{\vec\varepsilon_j}= a^j \frac{\partial u'^i}{\partial u^j}\vec\varepsilon'_i\]
or, simplifying \(\large \vec \varepsilon'_i:\)
\[\large a'^{i} = \frac{\partial u'^i}{\partial u^\color{blue}{j}} a^{\color{blue}{j}}\tag 7\]
which is a contravariant transformation.
On the other hand we can express \(\vec A\) as the sum of the components in the different orthogonal \(\varepsilon^i\) vectors in the diagram above, in which case it would transform as a covariant vector:
KEY CONCEPT: The tangent vectors to the cuvilinear coordinates are denoted with a subscript \(\vec \varepsilon_i\). Their orthogonal components are denoted with a superscript \(\vec \varepsilon^i.\)
Recap:
We can express the \(\vec r\) vector in the curvilinear coordinates (diagrams above), as functions of the components of the individual curvilinear coordinates.
\[\vec r = \hat f (u^1,u^2,u^3) \, \vec {\hat e_1} + \hat g (u^1,u^2,u^3) \vec {\hat e_2} + \hat h (u^1,u^2,u^3) \vec {\hat e_3}\]
and we can obtain the partial derivative with respect to any of the curvilinear coordinates, and we get one of the tangential vectors (\(\varepsilon_i\)):
\[\bbox[10px, border:2px solid aqua]{\varepsilon_i = \frac{\partial \vec r}{\partial u^i}= \frac{\partial x^1}{\partial u^i} \vec {\hat e_1} + \frac{\partial x^2}{\partial u^i} \vec {\hat e_2} + \frac{\partial x^3}{\partial u^i} \vec {\hat e_3}}\tag 8\]
We want to write the gradient of the cuvilinear coordinates \(U^j\). Now remembering Eq. 3 above:
\[\vec \nabla \varphi= \frac{\partial \varphi}{\partial x^1}\,\vec{X^1} + \frac{\partial \varphi}{\partial x^2}\,\vec{X^2}\tag 3\]
we can write:
\[\bbox[10px, border:2px solid aqua]{\varepsilon^j =\vec \nabla u^j= \frac{\partial u^j}{\partial x^1}\,\vec {\hat e_1} + \frac{\partial u^j}{\partial x^2}\,\vec {\hat e_2} + \frac{\partial u^j}{\partial x^3}\,\vec {\hat e_3}}\tag 9\]
for \(j=\{1,2,3\}\)
Taking as example the spherical coordinates, and specifically the \(r\) component (see 3D rendition above):
\(r = \left((x^1)^2+(x^2)^2+(x^3)\right)^{1/2}\)
we can take the partial with respect to each one of the Cartesian components:
\[\varepsilon^r =\frac{\partial \left((x^1)^2+(x^2)^2+(x^3)\right)^{1/2}}{\partial x^l}\,\vec {\hat e_1} + \frac{\partial \left((x^1)^2+(x^2)^2+(x^3)\right)^{1/2}}{\partial x^2}\,\vec {\hat e_2}+ \frac{\partial \left((x^1)^2+(x^2)^2+(x^3)\right)^{1/2}}{\partial x^3}\,\vec {\hat e_3}\]
For the angle \(\phi = \tan^{-1}(\frac{x^2}{x^1})\),
\[\varepsilon^\phi =\frac{\partial \tan^{-1}(\frac{x^2}{x^1})}{\partial x^l}\,\vec {\hat e_1} + \frac{\tan^{-1}(\frac{x^2}{x^1})}{\partial x^2}\,\vec {\hat e_2}+ \frac{\partial \tan^{-1}(\frac{x^2}{x^1})}{\partial x^3}\,\vec {\hat e_3}\]
etc.
Dotting \(\varepsilon_i\) with \(\varepsilon^j:\)
\[\bbox[10px, border:2px solid aqua]{\varepsilon_i \cdot \varepsilon^j} = \frac{\partial x^1}{\partial u^i}\frac{\partial u^j}{\partial x^1}+\frac{\partial x^2}{\partial u^i}\frac{\partial u^j}{\partial x^2}+\frac{\partial x^3}{\partial u^i}\frac{\partial u^j}{\partial x^3} =\bbox[10px, border:2px solid aqua]{\frac{\partial u^j}{\partial u^i}}\tag {10}\]
Notice that it is not \(3\,\frac{\partial u^j}{\partial u^i}\), because the partials add up to \(1\): Let’s consider a differential in general:
\[dw=\frac{\partial w}{\partial x^1} dx^1 + \frac{\partial w}{\partial x^2} dx^2+\frac{\partial w}{\partial x^3} dx^3\]
Now dividing by \(dw:\)
\[1=\frac{\partial w}{\partial x^1} dx^1 + \frac{\partial w}{\partial x^2} dx^2+\frac{\partial w}{\partial x^3} dx^3\]
we are adding partials together to get the entire component.
If \(\large j=i\), \(\large \varepsilon_i\cdot \varepsilon^j = 1\)
If \(\large j \neq i\), \(\large \varepsilon_i\cdot \varepsilon^j\), and since the axes \((u^1, u^2, u^3)\) are independent, \(\frac{\partial u^j}{\partial u^i}=0.\)
Hence, we can express this as \(\large\bbox[10px, border:2px solid aqua]{\varepsilon_i \cdot \varepsilon^j = \frac{\partial \vec r}{\partial u^i}\,\vec \nabla u^j=\delta_i^{\;j}}\tag {11}\)
So these coordinate axes are independent of each other. \(\large \vec \nabla u^j\) correspond to the orthogonal vectors to the generalized curvilinear coordinates symbolized by the \(\large \vec\varepsilon^i\) vectors on the diagram above comparing tangential to orthogonal components.
Importantly, only in cases like spherical coordinates are \(\large (\vec e_1, \vec e_2,\vec e_3)\) orthogonal to each other (and the same applies to \(\large (\vec e^1, \vec e^2, \vec e^3\)). However, \(e_i\) and \(e^j\) are orthogonal to each other.
So a vector \(\vec A\) as in equation \((5)\) above can be expressed as a linear function of contravariant vectors:
\[\vec A = a^1\, \varepsilon_1 + a^2 \, \varepsilon_2 + a^3 \, \varepsilon_3=a^i\varepsilon_i\]
or covariant vectors:
\[\vec A = a_1\, \varepsilon^1 + a_2 \, \varepsilon^2 + a_3 \, \varepsilon^3=a_i\varepsilon^i\]
Notice, then, that something like \(a^j\varepsilon_j\varepsilon^i =a^j(\varepsilon_j\varepsilon^i)=a^i.\) Further, \(a^j\varepsilon_j=\vec A\). So by dotting \(\vec A\) with \(\varepsilon^i\) we get the contravariant components of the vector. Likewise if we start off with \(a_j\varepsilon^j\) and we dot with \(\varepsilon_i\) we’ll get \(a_i\), which are the covariant components.
Now if we attempt the same change of coordinates from \(U\) to \(U'\) that led to Eq. \((6)\) with a vector expressed as a covariant vector we can utilize the chain rule to substitute the terms in Eq.\((8)\):
\[\bbox[10px, border:2px solid aqua]{\varepsilon^j =\vec \nabla u^j= \frac{\partial u^j}{\partial x^1}\,\vec {\hat e_1} + \frac{\partial u^j}{\partial x^2}\,\vec {\hat e_2} + \frac{\partial u^j}{\partial x^3}\,\vec {\hat e_3}}\tag 8\]
\[\frac{\partial u^j}{\partial x^1}=\frac{\partial u^j}{\partial u'^i}\frac{\partial u'^i}{\partial x^1}\]
\[\frac{\partial u^j}{\partial x^2}=\frac{\partial u^j}{\partial u'^i}\frac{\partial u'^i}{\partial x^2}\]
\[\frac{\partial u^j}{\partial x^3}=\frac{\partial u^j}{\partial u'^i}\frac{\partial u'^i}{\partial x^3}\]
So Eq. 8 evolves into:
\[\varepsilon^j =\vec \nabla u^j=\frac{\partial u^j}{\partial u'^i} \underset{\large {\nabla u'^i=\varepsilon'^i} }{\underbrace{\left[ \frac{\partial u'^i}{\partial x^1}\,\vec {\hat e_1} + \frac{\partial u'^i}{\partial x^2}\,\vec {\hat e_2} + \frac{\partial u'^i}{\partial x^3}\,\vec {\hat e_3}\right]}}\]
so
\[\large \varepsilon^j=\frac{\partial u^j}{\partial u'^i}\varepsilon'^i\]
or
\[\large\varepsilon'^i =\frac{\partial u'^i}{\partial u^j}\varepsilon^j\]
Therefore,
\[\vec A = a'_i \varepsilon'^i= a_j \varepsilon^j= a_j \frac{\partial u^j}{\partial u'^i}\varepsilon'^i\]
which implies that
\[\large a'_i= \frac{\partial u^\color{red}{j}}{\partial u'^i} a_\color{red}{j}\tag 9\]
to get the equivalent to Eq. 6. But now it is a covariant sum over \(j\).
NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.