NOTES ON STATISTICS, PROBABILITY and MATHEMATICS


Differential Operator \(\rm d\):


Following Prof. Shifrin’s youtube explanation, and XylyXylyX’s Invitation to n forms. Of note we can call them k-forms, as well. See discussion on forms below.


\(0\)-forms are functions \(f(\vec x).\) In GR, \(f(x^0,x^1,x^2,x^3).\)

The operator \(\text{d}\) is the exterior derivative.

The exterior derivative of a differential form of degree \(k\) is a differential form of degree \(k + 1.\)

The exterior derivative of a zero-form can only be understood through definition (according to the XylyXylyX video):

\[\underbrace{\quad\text{d}f\quad}_{\text{differential of a function}} = \underbrace{\quad\frac{\partial f}{\partial x^i}\quad}_{\text{coefficients}}\quad\underbrace{\quad dx^i\quad}_{\text{basis of }V^*}\]

\(\text{d}f\) is a \(1\)-form. If \(f\) is a smooth function (a \(0\)-form), then the exterior derivative of \(f\) is the differential of \(f.\) That is, \(df\) is the unique 1-form such that for every smooth vector field \(X,\) \(df(X) = d_Xf\), where \(d_Xf\) is the directional derivative of \(f\) in the direction of \(X.\)

This is usually interpreted in physics as changing the variables “a little bit” to see how the function changes (the total differential of the function).

\(\text{d}f\) is a covector, and as such is a map, or a \((0,1)\)-tensor, \(\langle \text{d}f \quad, \quad \rangle\) that takes in a vector and

\[\langle\; \text{d}f \quad, \quad \vec v\;\rangle=\left\langle \frac{df}{dx^i}\,dx^i\;,\;V^j\,\partial _j \right \rangle=\frac{\partial f}{\partial x^i}\, V^j\; \delta^i_j=\frac{df}{dx^i}\,V^i=\partial_i f\,V^i\]

The idea is that the operator \(dx^i\) is part of a basis of \(V^*\) (I am using superscript following the convention for covector, and as in XylyXylyX’s video). Prof. Shifrin introduces them as an alternative notation for the “standard basis” of \(V^*\) (which he refers to as \((\mathbb R^n)^*\)) axiomatically at this point, as \(\phi^i(\vec x) =\vec e_i \cdot \vec x= dx^i\cdot \vec x\). In fact he uses subscripts and writes \(\phi_i = dx_i.\) Further, he says that \(dx_i\) is a “symbol that represents the linear map \(dx_i(\vec)= \vec e_i\cdot \vec v.\)

These \(dx^i\)-s would select the \(i\)-th coordinate of the matrix of partial derivatives of a function \(f\) at a point \(\vec a\) (the derivative matrix only makes sense at a point):

\[D\,f(\vec a)=\begin{bmatrix}\frac{\partial f}{\partial x^i }(\vec a)&\cdots &\frac{\partial f}{\partial x^n }(\vec a)\end{bmatrix}\]

and

\[\text{d}f(\vec a) =\sum \frac{\partial f}{\partial x^i}(\vec a) dx^i\]

After \(\vec a\) is input into \((\cdot),\) \(\text{d}f()\) “eats” a vector \(\vec v,\) to produce the partial derivative of \(f\) at point \(\vec a\) in the direction of \(\vec v,\) notated as \(D_{\vec v} f(\vec a)\) - i.e. the directional derivative.

\[\text{d}f(\vec a) (\vec v)= \sum \frac{\partial f}{\partial x^i}(\vec a)\; dx^i\, (\vec v)= D_{\vec v}\,f(\vec a)\]

This would be the illustration:


What is an \(n\)-form? Just \((0, n)\) tensors. For a \(1\)-form, the basis is \(e^\nu\), so that \(\langle e^\nu,e_\mu \rangle=\delta^\nu_\mu\) or, in the coordinate basis, \(\langle dx^\nu\,,\,e_\mu \rangle= \delta^\mu_\nu.\) In this expression \(dx^\nu\) is a \(1\)-form in which instead of using the \(\text{d}f\), assuming an arbitray function, we just use \(dx^\nu.\)

A \(2\)-form will be expressed, as any (0,2)-tensor, as

\[\omega = \underbrace{\omega_{\mu\nu}}_{\text{components}}\;\underbrace{ dx^\mu \otimes dx^\nu }_{\text{basis}}\] which will take on two vectors

\[\omega_{\mu\nu}\; dx^\mu \otimes dx^\nu \;\left(V^\delta\,\partial_\delta, W^\lambda\,\partial_\lambda \right)=\omega_{\mu\nu}V^\delta W^\lambda \left\langle dx^\mu,\partial_\delta\right\rangle\; \left\langle dx^\nu,\partial_\lambda\right\rangle= \omega_{\mu\nu}V^\delta W^\lambda \; \delta^\mu_\delta\, \delta^\nu_\lambda= \omega_{\delta\lambda}V^\delta W^\lambda\]

What makes these \(n\)-forms different from regular tensors: There is a constraint. We are seeking anti-symmetry. We want a tensor acting on two vectors to be

\[\omega\,(\vec v, \vec w)\quad = \quad- \quad\omega\,(\vec w, \vec v)\]

we pass this constraint to the components \(\omega_{\mu\nu} = - \omega_{\nu\mu}.\) This implies that

\[\omega_{00}=-\omega_{00}\]

and

\[\omega_{01}=-\omega_{10}\]

These contraints would imply that a \(2\)-form would be written as

\[\begin{align} \omega &= \omega_{01}\left [ dx^0 \otimes dx^1 - dx^1 \otimes dx^0 \right] + \\[2ex] &= \omega_{02}\left [ dx^0 \otimes dx^2 - dx^2 \otimes dx^0 \right] + \\[2ex] &= \omega_{03}\left [ dx^0 \otimes dx^3 - dx^3 \otimes dx^0 \right] + \\[2ex] &= \omega_{12}\left [ dx^1 \otimes dx^2 - dx^2 \otimes dx^1 \right] + \\[2ex] &= \omega_{13}\left [ dx^1 \otimes dx^3 - dx^3 \otimes dx^1 \right] + \\[2ex] &= \omega_{23}\left [ dx^2 \otimes dx^3 - dx^3 \otimes dx^2 \right] \end{align}\]

Instead of the \(16\) vector basis of the regular tensor product we end up with only \(6\), and we can define the wedge operator of two one-forms:

\[dx^\mu \wedge dx^\nu= dx^\mu \otimes dx^\nu - dx^\nu \otimes dx^\mu\]

so that the \(2\)-form above results in

\[\omega =1/2\quad \omega_{\mu\nu}\,dx^\mu\wedge dx^\nu\]

and \(dx^\mu \wedge dx^\nu\) are the wedge products between the basis \(1\)-forms, giving us a basis for the vector space of \(2\)-forms.

The vector space of \(2\)-forms, tensors \(T^0{}_2\) is called the second exterior power, and it is written as \(\Lambda^2\), and since it is drawn from the dual space, the final notation is

\[\large \Lambda^2\; \left( V^*\right)\]

the same could be applied to \(V\), which would give us basis of the form \(\partial_\mu \wedge \partial_\nu=\partial_\mu \otimes \partial_\nu - \partial_\nu \otimes \partial_\mu.\)

This can clearly be generalized to

\[\large \Lambda^k\; \left( V^*\right)= \large \Lambda^k\; (\mathbb R^n)^*\]

which will have \(n\choose k\) basis vectors.


Professor Shifrin describes these forms with functions \(f\in C^\infty\) as coefficients, and calls the basis vectors \(d\vec x_I\) with \(I\) being and increasing multi-index \(i_1<i_2<\cdots<i_k.\) He introduced it when describing the vector space of alternating multilinear maps, \(T(\vec v_1,\cdots,\vec v_k)\to R\) given by the determinant of combinations of rows from a matrix with more rows than columns. This mapping of determinants is explained in my post here.

He describes for any open set, \(U\subset \mathbb R^n,\) the vector space \(\mathcal A^k(U)\) of the form

\[\sum f_I\,(\vec x)\,d\vec x_I\] where \(f_I: U \to \mathbb R.\) The index \(I\) will extend from \(1\) to \(k\), with \(k\) denoting the k-form. He calles them differential \(k\)-forms of \(U.\) As examples, he gives

\[e^{xy}\;dx\wedge dy\]

If \(\omega \in \mathcal A^k(U),\) then \(\omega=\sum f_I(\vec x)\,d\vec x_I\) where \(\vert I \vert=k\) with the rule:

\[d\omega \in \mathcal A^{k+1}(U): d\omega = \sum df_I(\vec x) \wedge d\vec x_I\]

First example:

First the function \(f\in C^\infty:\)

\[f\begin{pmatrix}x\\y\\z\end{pmatrix}= \exp\,(x)\,\sin\,(y\,z)\]

Now \(\text{d}f:\)

\[df = \left( \exp\,(x)\,\sin\,(yz) \right)\; dx+ \left( z\,\exp(x)\,\cos\,(yz) \right)\;dy+ \left(y\,\exp(x)\,\cos\,(yz) \right) dz\]

which is a \(1\)-form

What if we took \(\text d\) of this \(d\) we just took…

\[\small\text{d}\left(\text{d}f\right)=\exp(x)\left[(\cos(yz)) (z\,dy + y\,dz) \wedge dx + \left(z\cos(yz) dx +(\cos(yz) - z y\sin(yz))dz\right)\wedge dy+ (y\cos(yz)dx + (\cos(yz)- yz\sin(yz))dy)\wedge dz\right]\]

Notice that only some partials matter, because of the wedging.

Second example:

If \(\omega \in \mathcal A^1(\mathbb R^3)\) as

\[\omega = y\,dx + z\,dy + x\,dz\]

\[d\omega = dy\wedge dz + dz\wedge dy + dx\wedge dz\]

which follows the rule of differentiating the coefficient functions, and wedge them with the \(dx\) they are sitting next to. For instance, for the first term, \(y\,dx\), differentiating \(y\) gets us \(dy = \frac{\partial y}{\partial x}dx+ \frac{\partial y}{\partial y}dy+\frac{\partial y}{\partial z}dz= dy.\) So we end up with \(dy\wedge dx.\)

If we were to \(\text{d}(d\omega)=0\) because all the coefficients are \(1.\)

Third example:

\[\omega = (x+y^2) dx + (yz + x) dy + (z^3 + 3xy)dz\]

\[d\omega= (1 dx + 2ydy + 0 dz)\wedge dx +(1 dx + z dy + y dz)\wedge dy + (3ydx+ 3x dy+ 3z^2 dz)\wedge dz\]

which greatly simplifies given that \(dx\wedge dx=0,\) for example. And that \(dx\wedge dy=-dy\wedge dx.\)

Now \(d\omega \in \mathcal A^2(\mathbb R^3).\)

Fourth example:

\[\varphi\in \mathcal A^2(\mathbb R^3)\] \[\varphi=(x e^z) dy\wedge dz + (y\cos x) dz\wedge dx+ (x^3+ y^3+z^3) dx\wedge dy\]

taking \(d\varphi\in \mathcal A^3(\mathbb R^3).\) For the first term, we have an \(x\) partial \((e^z)\) and a \(z\) partial \(xe^z\), but we don’t care about the latter because there is a \(dz\) in \(dy \wedge dz:\)

\[d\varphi=e^z dx\wedge dy\wedge dz+ \cos x\, dy\wedge dz \wedge dx+ 3z^2 dz\wedge dx \wedge dy= (e^z+ \cos x+ 3z^2)dx\wedge dy\wedge dz\]


Properties of \(\text d\):

If \(\omega, \phi \in \mathcal A^k(U),\)

\[\text{d}(\omega + \phi ) = \text d\omega + \text d\phi\]

\[\text d(\text d \omega)=0\]

\[\text d(f \omega) = \text df\wedge \omega + f\text d\omega\]

And, if \(\omega \in \mathcal A^k(U),\) and \(\phi \in \mathcal A^l(U),\)

\[\text d(\omega \wedge \phi)= \text{d}\omega\wedge \phi + (-1)^k\omega \wedge \text d\phi\]


From this post.

With the additional datum of position, vectors become tangent vectors. If \(v\) is a vector and \(p\) is a point, then we will use \(v_p\) will denote the tangent vector corresponding to \(v\) at the point \(p\). The reason they are called tangent vectors comes from their use in differential geometry, where a vast, powerful generalization of lines, planes, and other “Euclidean” spaces, called a “(smooth) manifold,” is used. A circle is an example of such a space, and the tangent line to a point on the circle can be thought of as the set of all tangent vectors to the circle at that point. For the sake of intuition, we will stay within Euclidean space.

Suppose \(f:\mathbb{R}^2\to\mathbb{R}\) is a smooth function from the real plane to the real line. At each point \(p\in\mathbb{R}^2\), we can take the directional derivative of \(f\) in whatever direction we like. For example, if we have coordinates \((x,y)\), then \(\left.\frac{\partial f}{\partial x}\right|_p\) is the directional derivative at \(p\) of \(f\) in the direction of increasing \(x\). If we wanted to take the directional derivative at \(p\) of \(f\) in the direction of decreasing \(x\), then we would just negate to obtain \(-\left.\frac{\partial f}{\partial x}\right|_p\).

In fact, if we pick any tangent vector at \(p\), it will correspond to a directional derivative at \(p\) in the direction and with the corresponding magnitude of that tangent vector. Following the above example, taking the directional derivative at \(p\) in the direction of increasing \(x\) is the same as applying \(\left.\frac{\partial}{\partial x}\right|_p\), and going the opposite direction is the same as applying \(-\left.\frac{\partial}{\partial x}\right|_p\). Taking this a step further, we claim that tangent vectors and directional derivatives are the same thing! This may seem shocking, but all it says is that every tangent vector corresponds to a directional derivative with the same magnitude and direction. If we are working in the plane, then we can therefore identify \(\left.\frac{\partial}{\partial x}\right|_p\) with \(\begin{bmatrix}1 & 0\end{bmatrix}^T_p\) and \(\left.\frac{\partial}{\partial y}\right|_p\) with \(\begin{bmatrix} 0 & 1\end{bmatrix}^T_p\), where \(v^T\) denotes the transpose of a vector \(v\).

We use the transpose to use column vectors for tangent vectors. By doing this, we can let row vectors act on them by left multiplication. Note that left multiplying a column vector by a row vector is the same as taking the dot product between those two vectors. Thus, row vectors can be viewed as linear functions from column vectors to the space of real numbers. Attaching these linear functionals to a point as we did with tangent vectors, we obtain cotangent vectors, or covectors. Hinting at the direction of conversation, in the plane we will denote the covector \(\alpha_p\) that satisfies \(\alpha_p\left(\left.\frac{\partial}{\partial x}\right|_p\right)=1\) and \(\alpha_p\left(\left.\frac{\partial}{\partial y}\right|_p\right)=0\) by \(\mathrm{d}x_p\).

The concept of a vector field from multivariate calculus is best seen in the context of tangent vectors. Using our terminology, a vector field is simply a function that takes in a point in \(\mathbb{R}^n\) and outputs a tangent vector at that point. Similarly, we define a differential \(1\)-form as a function that takes in a point in \(\mathbb{R}^n\) and outputs a covector at that point. For example, the map \[\mathrm{d}x:p\mapsto\mathrm{d}x_p\] is a differential \(1\)-form. Thus, \(\mathrm{d}x\) is simply the differential \(1\)-form that takes each point \(p\) of the space to the cotangent vector \(\mathrm{d}x_p\) at \(p\) that satisfies \(\mathrm{d}x_p\left(\left.\frac{\partial}{\partial x}\right|_p\right)=1\) and \(\mathrm{d}x_p(v_p)=0\) for all \(v_p\not\in\left\langle\left.\frac{\partial}{\partial x}\right|_p\right\rangle\).

How could these possibly be interpreted as infinitesimals? I think Spivak explains this best:

“Classical differential geometers (and classical analysts) did not hesitate to talk about ‘infinitely small’ changes \(\mathrm{d}x^i\) of the coordinates \(x^i\), just as Leibnitz (sic) had. No one wanted to admit that this was nonsense, because true results were obtained when these infinitely small quantities were divided into each other (provided one did it in the right way).

“Eventually it was realized that the closest one can come to describing an infinitely small change is to describe a direction in which this change is supposed to occur, i.e., a tangent vector. Since \(\mathrm{d}f\) is supposed to be an infinitesimal change of \(f\) under an infinitesimal change of the point, \(\mathrm{d}f\) must be a function of this change, which means that \(\mathrm{d}f\) should be a function on tangent vectors. The \(\mathrm{d}x^i\) themselves then metamorphosed into functions, and it became clear that they must be distinguished from the tangent vectors \(\partial/\partial x^i\).”


And from this answer:

Let \(f(x,y) = x^2y\).

The derivative of a function gives the best linear approximation to the function at a point. This remains true in higher dimensions. The derivative of the function \(f\) above is

\(df = \begin{bmatrix} \frac{\partial f}{\partial x} & \frac{\partial f}{\partial y}\end{bmatrix} = \begin{bmatrix} 2xy & x^2\end{bmatrix}\).

The conceptual meaning of the derivative is this:

\[f(x+\Delta x, y+\Delta y) \approx f(x,y) + df(\begin{bmatrix} \Delta x \\ \Delta y\end{bmatrix}) = x^2y + \begin{bmatrix} 2xy & x^2\end{bmatrix}\begin{bmatrix} \Delta x \\ \Delta y\end{bmatrix} = x^2y+2xy\Delta x + x^2\Delta y\]

In other words, at each point \((x,y)\) the derivative is a linear map which takes a small change \(\begin{bmatrix} \Delta x \\ \Delta y\end{bmatrix}\) away from the point \((x,y)\) and returns the approximate change in \(f\) resulting from that.

Now say someone told me that a certain function \(g\) with \(g(0,0)=0\) had derivative \(\begin{bmatrix} y\cos(xy) & x\cos(xy)\end{bmatrix}\), and I wanted to figure out what \(g\) was. In this case I could probably just solve the differential equations \(\frac{\partial g}{\partial x} = y\cos(xy)\) and \(\frac{\partial g}{\partial y} = y\cos(xy)\) by inspection, but this would not always be possible.

Let us stick to the somewhat easier problem of approximating \(g(1,1)\). Here is my idea for doing that: I will pick a path from \((0,0)\) (whose value I know) to \((1,1)\). I will split that path up into millions of vector changes. Then I will use what I know about the derivative to approximate the change in \(g\) over each of those small changes and add them up. This should give me a pretty reasonable approximation.

In this case, I can see that I can pick the path \(\gamma:[0,1] \to \mathbb{R}^2\) given by \(\gamma(t) = (t,t)\). Splitting this into \(k\) pieces, I have the following approximations:

\[g(\frac{1}{k},\frac{1}{k}) \approx g(0,0) + dg|_{(0,0)}\left(\begin{bmatrix} \frac{1}{k} \\ \frac{1}{k}\end{bmatrix}\right)\].

So then

\[g(\frac{2}{k},\frac{2}{k}) \approx g(\frac{1}{k},\frac{1}{k}) + dg|_{(\frac{1}{k},\frac{1}{k})}\left(\begin{bmatrix} \frac{1}{k} \\ \frac{1}{k}\end{bmatrix}\right) \approx g(0,0) + dg|_{(0,0)}\left(\begin{bmatrix} \frac{1}{k} \\ \frac{1}{k}\end{bmatrix}\right) + dg|_{(\frac{1}{k},\frac{1}{k})}\left(\begin{bmatrix} \frac{1}{k} \\ \frac{1}{k}\end{bmatrix}\right)\].

Continuing on in this way , we will see that

\[g(1,1) \approx g(0,0) + \sum_{i=0}^k dg\big|_{\frac{i}{k}}\left(\begin{bmatrix} \frac{1}{k} \\ \frac{1}{k}\end{bmatrix}\right)\]

It makes sense to give some name to this process. We define the limit of the sum above to be the integral of the covector field \(dg\) along the path \(\gamma\). Refer to my other post for the general definition, instead of just a particular example like this.

So far we have defined the integral only for derivatives of functions, and we have defined it exactly in such a way that the following fundamental theorem of calculus holds:

\[g(P_1) - g(P_0) = \int_\gamma dg\] for any path \(\gamma\) from \(P_0\) to \(P_1\). But the definition of the integral never used the fact that we were integrating the derivative of a function: it only mattered that we are integrating a covector field (i.e. a gadget which eats change vectors and spits out numbers). So we can use exactly the same definition to give the integral of a general covector field \(\begin{bmatrix} f(x,y) & g(x,y)\end{bmatrix}\), which may or may not be the differential of a function.

(There are certainly covector fields which are not derivatives of functions. For example, \(\begin{bmatrix} x & x\end{bmatrix}\) could not be the differential of a function, for if it were we would have \(\frac{\partial f}{\partial x} = x\) and \(\frac{\partial f}{\partial y} = x\). But then the mixed partials of \(f\) would not be equal, contradicting Clairout’s theorem.)

\(dx\) is the constant covector field \(\begin{bmatrix} 1 & 0\end{bmatrix}\), and \(dy\) is the constant covector field \(\begin{bmatrix} 0 & 1\end{bmatrix}\). So we can write any covector field \(\begin{bmatrix} f(x,y) & g(x,y)\end{bmatrix}\) as \(f(x,y)dx + g(x,y)dy\). Integrating this thing along a curve is PRECISELY what you defined as a line integral in your first multivariable calculus course.


Home Page

NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.