Cotangent Spaces
This post explores the concept of the cotangent space at a point on a smooth manifold.

Given an \(n\)-dimensional real vector space \(V\) and a basis \(E_1,\dots,E_n\), each vector \(v = v^iE_i\) can be written as a column vector
\[v = \begin{bmatrix}v_1 \\ \vdots \\ v_n\end{bmatrix}.\]
It can also be written as a row vector:
\[v^{T} = \begin{bmatrix}v_1 & \dots & v_n\end{bmatrix}.\]
The row vector is the transpose of the column vector in the sense that it interchanges the row and column indices, i.e. \(v_{i,1} = v^{T}_{1,i}\), where the first index indicates the row and the second index indicates the column using the typical notation for matrices.
Using the usual notion of the matrix product (or of the dot product), a row vector can be multiplied with a column vector to produce a scalar:
\[\begin{bmatrix}\omega_1 & \dots & \omega_n\end{bmatrix} \begin{bmatrix}v^1 \\ \vdots \\ v^n\end{bmatrix} = \omega_1 \cdot v^1 + \dots + \omega_n \cdot v^n.\]
Thus, a row vector (transposed vector) \(\omega\) can be viewed as a operator that acts by mapping a vector to a scalar, i.e. an operator with signature \(\omega : V \rightarrow \mathbb{R}\).
This operator, like everything in linear algebra, should be linear (over \(\mathbb{R}\)), i.e., for every \(v \in V\) and \(a \in \mathbb{R}\), the following should hold:
- \(\omega(v_1 + v_2) = \omega(v_1) + \omega(v_2),\)
- \(\omega(a \cdot v) = a \cdot \omega(v).\)
Such operators are called linear functionals.
Given a basis \(E_1,\dots,E_n\), row vectors and linear functionals are equivalent. For any linear functional \(\omega : V \rightarrow \mathbb{R}\), a corresponding row vector can be produced as follows:
\[\begin{bmatrix}\omega(E_1) & \dots & \omega(E_n) \end{bmatrix}.\]
For any vector \(v \in V\), since \(v = v^iE_i\) and \(\omega\) is a linear functional, it follows that
\[\omega(v) = \omega(v^iE_i) = v^i \cdot \omega(E_i),\]
which is precisely the same as the product
\[\begin{bmatrix}\omega(E_1) & \dots & \omega(E_n)\end{bmatrix} \begin{bmatrix}v^1 \\ \vdots \\ v^n\end{bmatrix} = \omega(E_1) \cdot v^1 + \dots + \omega(E_n) \cdot v^n.\]
Linear functionals also comprise a vector space under the following operations (where \(v \in V\) and \(a \in \mathbb{R}\)):
- \((\omega_1 + \omega_2)(v) = \omega_1(v) + \omega_2(v),\)
- \((a \cdot\omega)(v) = a \cdot \omega(v).\)
For any real vector space \(V\), the dual vector space \(V^*\) is the vector space of linear functionals over \(V\). This vector space can be thought of as the space of "dual" or "transposed" vectors. The elements of a dual vector space are called covectors.
As we demonstrated, given a basis \(E_1,\dots,E_n\) for \(V\), for each \(v \in V\) and \(\omega \in V^*\), we have
\[\omega(v) = \omega(v^i \cdot E_i) = v^i \cdot \omega(E_i).\]
Let us write \(\varepsilon^i\) for the linear functional representing the \(i\)-th projection, i.e. \(\varepsilon^i(v) = v^i\); since this is a linear functional, it follows that
\begin{align}\varepsilon^i(v) &= \varepsilon^i(v^j \cdot E_j) \\&= v^j \cdot \varepsilon^i(E_j) \\&= v^i\end{align}
Thus, if we define \(\varepsilon^i(E_j) = \delta^i_j\) then this equation is satisfied. Recall that the delta function is defined such that
\[\delta^i_j = \begin{cases}1, & \text{if \(i = j\)}\\ 0, & \text{otherwise}.\end{cases}\]
Thus, we have that
\begin{align}\varepsilon^i(v) &= v^j \cdot \varepsilon^i(E_j) \\&= v^j \cdot \delta^i_j \\&= v^i.\end{align}
This means that we can write every covector \(\omega\) as follows:
\[\omega = \omega(E_i) \cdot \varepsilon^i\]
since
\begin{align}\omega(v) &= \omega(E_i) \cdot v^i \\&= \omega(E_i) \cdot \varepsilon^i(v)\\&= (\omega(E_i) \cdot \varepsilon^i)(v).\end{align}
This means that the covectors \(\varepsilon^i\) comprise a basis for the dual space \(V^*\), called the dual basis. Each dual basis is specific to a choice of basis for \(V\). Thus \(\textrm{dim}(V) = \textrm{dim}(V^*)\), i.e. each vector space has the same dimension since they have the same number of basis vectors.
Note that the notation \(\omega_i = \omega(E_i)\) is often used, so that, for each covector \(\omega\), \(\omega = \omega_i \varepsilon^i\).
The vector space \(\mathbb{R}^n\) has a standard, orthonormal basis \(e_1,\dots,e_n\), where
\[e_1 = \begin{bmatrix}1 \\ \vdots \\ 0\end{bmatrix},\]
\[e_2 = \begin{bmatrix}0\\ 1\\ \vdots \\ 0\end{bmatrix},\]
\[e_3 = \begin{bmatrix}0\\ 0 \\ 1\\ \vdots \\ 0\end{bmatrix},\]
and so forth.
The dual basis of the standard basis is called the standard dual basis and is denoted \(e^1,\dots,e^n\) (with superscript indexes).
Since, as we previously established, we have that
\[e^i = \begin{bmatrix}e^i(E_1) \dots e^i(E_n) \end{bmatrix},\]
since \(e^i(E_j) = \delta^i_j\), it follows that
\[e^1 = \begin{bmatrix}1 & \dots & 0 \end{bmatrix},\]
\[e^2 = \begin{bmatrix}0 & 1 & \dots & 0 \end{bmatrix},\]
\[e^3 = \begin{bmatrix}0 & 0 & 1 & \dots & 0 \end{bmatrix},\]
and so forth. Thus, the standard dual basis vectors are the transpose of the standard dual vectors, just as we require.
Transpose of a Linear Map
Next, let's consider what the transpose of a linear map must be. Recall that, for any linear map \(A : V \rightarrow W\) between finite-dimensional real vector spaces \(V\) and \(W\), given a basis \((E^V_1,\dots,E^V_n)\) for \(V\) and a basis \((E^W_1,\dots,E^W_m)\) for \(W\), for each vector \(v \in V\), we can represent \(Av\) in terms of these bases as follows (where \(A^i\) is the \(i\)-th coordinate function in the basis \((E^W_1,\dots,E^W_m)\), i.e. \(Av = A^i(v) \cdot E^W_i\)):
\[Av = A^i(v^j \cdot E^V_j) \cdot E^W_i = v^j \cdot A^i(E^V_j) \cdot E^W_i.\]
The matrix representation of this map is the matrix with row \(i\) and column \(j\):
\[Av = \begin{bmatrix}A^1(E^V_1) & A^1(E^V_2) & \dots & A^1(E^V_n) \\ \vdots & \vdots &\ddots & \vdots \\ A^m(E^V_1) & A^m(E^V_2) & \dots & A^m(E^V_n)\end{bmatrix} \begin{bmatrix}v^1 \\ v^2 \\ \vdots \\ v^n\end{bmatrix}.\]
Thus, the components \(A_{i,j}\) of this matrix are calculated as follows:
\[A_{i,j} = A^i(E^V_j).\]
Our goal is to define the transpose \(A^* : W^* \rightarrow V^*\) of a linear map \(A : V \rightarrow W\). We intend for the matrix representation of \(A^*\) to be the transpose of the matrix representation of \(A\), that is, \(A^*_{i,j} = A_{j,i}\). Note that, due to the transposition, the \(m \times n\) matrix becomes an \(n \times m \) matrix which therefore maps vectors with \(m\) elements to vectors with \(n\) elements, i.e. it goes in the opposite direction, and so the signature is \(A^* : W^* \rightarrow V^*\).
Now, we want to define the map \(A^* : W^* \rightarrow V^*\). There is just one map that makes immediate sense; for each \(\omega \in W^*\) and \(v \in V\) we define the dual map or transpose map \(A^*\) as follows:
\[A^*(\omega)(v) = \omega(Av).\]
Note that this is equivalent to the following "point-free" definition:
\[A^*(\omega) = \omega \circ A.\]
Our goal is to confirm that the matrix representation of this map is indeed the transpose of the matrix representation of the map \(A\). We again apply the same analysis used to decompose a linear map into its constituent elements. Given a basis \((E^V_1,\dots,E^V_n)\) for \(V\) with dual basis \((\varepsilon^1_V,\dots,\varepsilon^n_V)\) and a basis \((E^W_1,\dots,E^W_m)\) for \(W\) with dual basis \((\varepsilon^1_W,\dots,\varepsilon^m_W)\), the linear map \(A^*\) can be decomposed in the same manner as any linear map (where the component functions \((A^*)_i\) are defined so that \(A^*(\omega) = (A^*)_i(\omega) \cdot \varepsilon^i_V\) and thus \((A^*)_i(\omega) = \omega_i = A^*(\omega)(E^V_i)\)):
\begin{align}A^*\omega &= (A^*)_i(\omega_j \cdot \varepsilon^j_W) \cdot \varepsilon^i_V\\&= A^*(\omega_j \cdot \varepsilon^j_W)(E^V_i) \cdot \varepsilon^i_V\\&= (\omega_j \cdot \varepsilon^j_W)(AE^V_i) \cdot \varepsilon^i_V\\&= (\omega_j \cdot \varepsilon^j_W)(A^k(E^V_i) \cdot E^W_k) \cdot \varepsilon^i_V \\&= (\omega_j \cdot \varepsilon^j_W(A^k(E^V_i) \cdot E^W_k)) \cdot \varepsilon^i_V \\&= (\omega_j \cdot A^k(E^V_i) \cdot \varepsilon^j_W(E^W_k)) \cdot \varepsilon^i_V\\&= (\omega_j \cdot A^k(E^V_i) \cdot \delta^j_k) \cdot \varepsilon^i_V\\&= (\omega_j \cdot A^j(E^V_i)) \cdot \varepsilon^i_V. \end{align}
As before, this implies that the matrix representation of \(A^*\) with row \(i\) and column \(j\) is
\[A^*\omega = \begin{bmatrix}A^1(E^V_1) & A^2(E^V_1) & \dots & A^m(E^V_1) \\ \vdots & \vdots &\ddots & \vdots \\ A^1(E^V_n) & A^2(E^V_n) & \dots & A^m(E^V_n)\end{bmatrix} \begin{bmatrix}\omega_1 \\ \omega_2 \\ \vdots \\ \omega_m\end{bmatrix}.\]
Thus, the matrix representation of \(A^*\) is
\[A^*_{i,j} = A^j(E^V_i) = A_{j,i}.\]
The matrix representing the transpose map \(A^*\) is indeed the transpose of the matrix representation of \(A\).
Functoriality
The assignment of a vector space to its dual space and a linear map to its dual map is a contravariant functor from the category of vector spaces to itself. However, we must first verify the following requirements for contravariant functors:
- \((A \circ B)^* = B^* \circ A^*\) for all composable linear maps \(A,B\)
- \(\mathrm{Id}_V^* = \mathrm{Id}_{V^*}\) for all vector spaces \(V\)
First, we compute, for any linear maps \(A : V_1 \rightarrow V_2\) and \(B : V_2 \rightarrow V_3\) and covector \(\omega \in V_3^*\),
\begin{align}(A \circ B)^*(\omega) &= \omega \circ (A \circ B)\\&= (\omega \circ A) \circ B\\&= A^*(\omega) \circ B\\&= B^*(A^*(\omega)) \\&= (B^* \circ A^*)(\omega).\end{align}
Next, we compute for any \(\omega \in V^*\),
\begin{align}\mathrm{Id}_V^*(\omega) &= \omega \circ \mathrm{Id}_V\\&= \omega\\&= \mathrm{Id}_{V^*}(\omega).\end{align}
Isomorphism
As previously indicated, there is an isomorphism between every finite-dimensional vector space and its dual space. This follows from the fact that both vector spaces have the same dimension. However, this isomorphism is non-canonical in the sense that it requires an arbitrary choice of basis. Given a basis \(E_1,\dots,E_n\) for a vector space \(V\), each vector \(v = v^iE_i\) is mapped to the following covector, which can be expressed as
\[w \mapsto v \cdot w,\]
using the dot product, or as
\[w^iE_i \mapsto \sum_{i=1}^n w^i \cdot v^i,\]
or as
\[\sum_{i=1}^n v^i \cdot \varepsilon^i.\]
Conversely, each covector \(\omega \in V^*\) is mapped to the following vector:
\[\sum_{i=1}^n\omega_iE_i.\]
These maps are mutual inverses, since
\begin{align}v & \mapsto \sum_{i=1}^n v^i \cdot \varepsilon^i\\& \mapsto \sum_{j=1}^n\left(\sum_{i=1}^n v^i \cdot \varepsilon^i\right)(E_j)\cdot E_j\\&= \sum_{j=1}^n\left(\sum_{i=1}^n v^i \cdot \varepsilon^i(E_j)\right)\cdot E_j\\&= v^j\cdot E_j\\&= v,\end{align}
and
\begin{align}\omega & \mapsto \sum_{i=1}^n \omega_iE_i\\& \mapsto \omega_i \varepsilon^i\\&= \omega.\end{align}
Since, for every vector space \(V\), its dual space \(V^*\) is a vector space, we can also form its dual space \(V^{**}\), which is called the second dual space. There is a canonical isomorphism between every vector space and its second dual space.
We can exhibit the isomorphism \(\xi : V \rightarrow V^{**}\) as follows for every \(v \in V\) and \(\omega \in V^*\):
\[\xi(v)(\omega) = \omega(v)\]
and
\[\xi^{-1}(\psi) = \psi(\varepsilon^i) E_i,\]
where \(\psi : V^* \rightarrow \mathbb{R}\) is a linear functional on the dual space, \(E_1,\dots,E_n\) is a basis for \(V\), and \(\varepsilon^1,\dots,\varepsilon^n\) is the corresponding dual basis for \(V^*\).
Observe that
\begin{align}\xi^{-1}(\xi(v)) &= \xi(v)(\varepsilon^i)E_i \\&= \varepsilon^i(v)E_i \\&= v^iE_i\\&= v\end{align}
and
\begin{align}\xi(\xi^{-1}(\psi))(\omega) &= \xi(\psi(\varepsilon^i)E_i)(\omega) \\&= \omega(\psi(\varepsilon^i)E_i)\\&= \psi(\varepsilon^i)\omega(E_i)\\&= \psi(\varepsilon^i)\omega_i\\&= \psi(\omega_i \varepsilon^i) \\&= \psi(\omega).\end{align}
The final equation follows from the expression for the action of a covector on a vector in a given basis, as demonstrated above.
Note that, although the proof used a choice of basis to demonstrate that the map \(\xi\) has an inverse, the map \(\xi\) itself requires no choice of basis and is canonical. Another way to make this precise is to indicate that the isomorphism is natural, which means that the following diagram commutes for every vector space \(V,W\), linear map \(A : V \rightarrow W\), and second dual isomorphism \(\xi_V : V \rightarrow V^{**}\) and \(\xi_W : W \rightarrow W^{**}\):
\begin{CD} V @>A>> W\\ @V\xi_{V}VV @VV\xi_{W}V\\ V^{**} @>>A^{**}> W^{**} \end{CD}
In category theory, the family of maps \(\xi\) is called a natural isomorphism between the identity functor \(\mathrm{Id}\) on the category of finite-dimensional vector spaces and the functor that maps each vector space to its second dual and each linear map to its second pullback.
Note that, for any \(v \in V\) and \(\omega \in V^*\),
\begin{align}A^{**}(\xi_V(v))(\omega) &= (\xi_V(v) \circ A^*)(\omega)\\&= \xi_V(v)(A^*(\omega))\\&= \xi_V(v)(\omega \circ A)\\&= (\omega \circ A)(v)\\&= \omega(A(v))\\&= \xi_W(A(v))(\omega)\\&= (\xi_W \circ A)(v)(\omega).\end{align}
Thus, not only is each space isomorphic to its second dual, but maps between spaces have corresponding isomorphic maps between the respective second dual spaces.
Cotangent Vectors
Now we can apply the constructions of dual spaces to the tangent spaces of a smooth manifold. For any smooth manifold \(M\), the cotangent space at a point \(p \in M\) is the vector space denoted \(T^*_pM\) which is the dual space of the tangent space \(T_pM\) at the point \(p\), i.e.
\[T^*_pM = (T_pM)^*.\]
The elements of this vector space are called tangent covectors.
Tangent Covectors in Coordinates
Given a smooth chart \((U,x^i)\) on an open subset \(U \subseteq M\) of a smooth manifold \(M\), for each point \(p \in U\), the coordinate basis \((\partial/\partial x^i\rvert_p)\) for \(T_pM\) induces a dual basis for \(T^*_pM\) which is denoted \((dx^i\rvert_p)\); we will examine this notation in greater detail later, once we have introduced the notion of the differential of a real-valued function. Thus, applying the generic theory of dual spaces, each tangent covector \(\omega \in T^*_pM\) can thus be expressed as follows:
\[\omega = \omega_i dx^i\rvert_p = \omega\left(\frac{\partial}{\partial x^i}\bigg\rvert_p\right) dx^i\rvert_p.\]
Variance
Consider the vector space \(\mathbb{R}^2\) as an example with the standard basis
\[e_1 = \begin{bmatrix}1 \\ 0\end{bmatrix}, e_2 = \begin{bmatrix}0 \\ 1\end{bmatrix}.\]
The vector
\(v = \begin{bmatrix}1 \\ 1\end{bmatrix}\)
has coordinates \((1,1)\) in the standard basis. Now, suppose that we transform the basis by scaling it by a factor of \(2\) to obtain the basis
\[E_1 = \begin{bmatrix}2 \\ 0\end{bmatrix}, E_2 = \begin{bmatrix}0 \\ 2\end{bmatrix}.\]
The vector \(v\) now has components \((1/2,1/2)\) in this basis:
\[v = \frac{1}{2} \cdot \begin{bmatrix}2 \\ 0\end{bmatrix} + \frac{1}{2} \cdot \begin{bmatrix}0 \\ 2\end{bmatrix}.\]
Thus, the basis vectors were scaled by a factor of \(2\) whereas the coordinates were scaled by an inverse factor of \(1/2\).
The matrix that represents this scaling is
\[A = \begin{bmatrix}2 & 0 \\ 0 & 2\end{bmatrix}\]
and its inverse matrix is
\[A^{-1} = \begin{bmatrix}\frac{1}{2} & 0\\ 0 & \frac{1}{2}\end{bmatrix}.\]
To compute the "new" coordinates from the "old" coordinates, the inverse matrix is used:
\[\begin{bmatrix}\frac{1}{2} & 0\\ 0 & \frac{1}{2}\end{bmatrix}\begin{bmatrix}1 \\ 1\end{bmatrix} = \begin{bmatrix}\frac{1}{2} \\ \frac{1}{2}\end{bmatrix}.\]
To compute the "old" coordinates from the "new" coordinates, the matrix \(A\) is used:
\[\begin{bmatrix}2 & 0\\ 0 & 2\end{bmatrix}\begin{bmatrix}\frac{1}{2} \\ \frac{1}{2}\end{bmatrix} = \begin{bmatrix}1 \\ 1\end{bmatrix}.\]
If the components of a vector transform in the same manner as the basis vectors, then the transformation is called covariant. If the components of a vector transform in an inverse manner to the basis vectors, then the transformation is called contravariant. In the example above, when computing the "old" coordinates from the "new" coordinates, the transformation was covariant, and when computing the "new" coordinates from the "old" coordinates, the transformation was contravariant.
Often, when such a transformation is applied, the "old" coordinates are computed from the "new" coordinates, since this is tantamount to transforming the original vector in coordinates; for instance, when scaling by a factor of \(2\), the coordinates \((1,1)\) scale to \((2,2)\). Thus, it is this transformation which is often considered the "change of coordinates" or "change of basis".
In the post about tangent spaces, it was demonstrated that the components of tangent vectors transform under a change of coordinates as follows (where \((x^i)\) is a set of smooth coordinate functions, \((\tilde{x}^j)\) are the coordinate functions of a transition map to another smooth chart, \((\tilde{v}^j)\) are the "new" coordinates of a tangent vector \(v \in T_pM\) after the transition to an overlapping coordinate chart, and \((v^i)\) are the "old" coordinates of \(v\), \(M\) is a smooth manifold, \(p \in M\), and \(\hat{p}\) is the coordinate representation of \(p\)):
\[\tilde{v}^j = \frac{\partial \tilde{x}^j}{\partial x^i}(\hat{p}) \cdot v^i.\]
This corresponds to the equation
\[\frac{\partial}{\partial x^i}\bigg\rvert_p = \frac{\partial \tilde{x}^j}{\partial x^i}(\hat{p}) \frac{\partial}{\partial \tilde{x}^j}\bigg\rvert_p.\]
The matrix
\[\left(\frac{\partial \tilde{x}^j}{\partial x^i}(\hat{p})\right)\]
with rows \(j\) and columns \(i\) is the Jacobian matrix which represents the transition map. Since the "new" coordinates \((\tilde{v}^j)\) are computed from the "old" coordinates \((v^i)\) according to the inverse of the Jacobian matrix (i.e. it is multiplied against the "old" coordinates), the change of coordinates is contravariant.
Now, consider how covectors \(\omega \in T^*_pM\) transform under a change of basis:
\begin{align}\omega_i &= \omega\left(\frac{\partial}{\partial x^i}\bigg\rvert_p\right)\\&= \omega\left(\frac{\partial \tilde{x}^j}{\partial x^i}(\hat{p}) \frac{\partial}{\partial \tilde{x}^j}\bigg\rvert_p\right)\\ &= \frac{\partial \tilde{x}^j}{\partial x^i}(\hat{p}) \omega\left(\frac{\partial}{\partial \tilde{x}^j}\bigg\rvert_p\right)\\&= \frac{\partial \tilde{x}^j}{\partial x^i}(\hat{p}) \omega_j.\end{align}
Since the "old" coordinates \((\omega_i)\) are computed from the "new" coordinates \((\omega_j)\) according to the Jacobian matrix (i.e. it is multiplied against the "new" coordinates), the change of coordinates is covariant.
Thus, sometimes, tangent vectors are called contravariant vectors and covectors are called covariant vectors (although, the vectors themselves are invariant, it is really the coordinates that vary).
The notion of variance is mostly a heuristic notion that is used in certain applications in engineering or physics.
The Cotangent Bundle
The cotangent bundle is the disjoint union of each cotangent space at every point \(p\) of a smooth manifold \(M\):
\[T^*M = \bigsqcup_{p \in M} T^*_pM.\]
Typically, the set \(\{(p,v) : p \in M, v \in T_pM\}\) is taken to represent this disjoint union.
There is a natural projection \(\pi : T^*M \rightarrow M\) defined as \(\pi(p,v) = p\).
For any open subset \(U \subseteq M\) and smooth coordinates \((x^i)\) on \(U\), given a coordinate basis \((\partial/\partial x^i)\rvert_p\) for \(T_pM\), the corresponding dual basis for \(T^*_pM\) is denoted \(dx^i\rvert_p\). This defines maps \(dx^i : M \rightarrow T^*M\) called the coordinate covector fields.
In general, a local or global section of the cotangent bundle is called a covector field, i.e. a a covector field is a map \(\omega : U \rightarrow T^*M\) for an open subset \(U \subseteq M\) such that \(\pi \circ \omega = \mathrm{Id}_M\). In other words, a covector field maps each point \(p\) to a unique covector \(\omega_p \in T^*_pM\). A covector field is smooth if it is smooth as a map \(\omega : U \rightarrow T^*M\).
The set of all smooth covector fields over a smooth manifold \(M\) is denoted \(\mathfrak{X}^*(M)\). This set forms a real vector space under the following operations:
- \((\omega^1 + \omega^1)_p = \omega^1_p + \omega^2_p\)
- \((a \cdot \omega)_p = a \cdot \omega_p\) for \(a \in \mathbb{R}\)
It is a module over the ring \(C^{\infty}(M)\) using the following notion of multiplication for each \(f \in C^{\infty}(M)\):
\[(f\omega)_p = f(p)\omega_p.\]
Every covector field \(\omega\) can be written in terms of the coordinate covector fields as
\[\omega = \omega_i dx^i,\]
where \(\omega_i : M \rightarrow \mathbb{R}\) is the \(i\)-th component function of \(\omega\) defined as
\[\omega_i(p) = \omega_p\left(\frac{\partial}{\partial x^i}\bigg\rvert_p\right).\]
Just as we can apply a covector to a vector to yield a scalar, we can apply a covector field \(\omega\) to a vector field \(X\) to yield a function as follows:
\[\omega(X)(p) = \omega_p(X_p).\]
In terms of local coordinates, if \(\omega = \omega_i dx^i\) and \(X = X^j \partial/\partial x^j\), then
\begin{align}\omega(X) &= \omega_i dx^i\left(X^j \frac{\partial}{\partial x^j }\right)\\&= \omega_i X^j dx^i\left(\frac{\partial}{\partial x^j }\right)\\&= \omega_i X^j \delta^i_j \\&= \omega_i X^i.\end{align}
Differential of a Function
For any smooth, real-valued function \(f : M \rightarrow \mathbb{R}\) on a smooth manifold \(M\), we can define a smooth covector field denoted \(df\) called the differential of \(f\) as follows:
\[df\rvert_p(v) = v(f) \text{, \(v \in T_pM\)}.\]
This is indeed a covector field, since \(df_p\) is a linear functional for each \(p \in M\):
- \(df_p(v + w) = (v + w)(f) = v(f) + w(f) = df_p(v) + df_p(w),\)
- \(df_p(a \cdot v) = (a \cdot v)(f) = a \cdot v(f) = a \cdot df_p(v).\)
Let's consider the coordinate expression for \(df\) for some smooth coordinates \((x^i)\). We will temporarily write the coordinate covectors as \((\bar{d}x^i)\) and we will denote the component functions of \(df\) as \((df)_i\). Then, for each point \(p\):
\begin{align}df_p &= (df)_i(p) \bar{d}x^i\rvert_p\\&= df\left(\frac{\partial}{\partial x^i}\bigg\rvert_p\right) \bar{d}x^i\rvert_p\\&= \frac{\partial}{\partial x^i}\bigg\rvert_p(f) \bar{d}x^i\rvert_p\\&= \frac{\partial f}{\partial x^i}(p) \bar{d}x^i\rvert_p.\end{align}
Applying this to the special case of one the coordinate functions \(x^j\), we obtain:
\begin{align}dx^j\rvert_p &= \frac{\partial x^j}{\partial x^i}(p) \bar{d}x^i\rvert_p \\&=\delta^i_j \bar{d}x^i\rvert_p \\&= \bar{d}x^j\rvert_p.\end{align}
Thus, the \(i\)-th coordinate covector field \(dx^i\) is the same as the differential \(dx^i\) of the \(i\)-th coordinate function, and hence the same notation is used. This explains the choice of notation. Thus, we may write
\[df_p = \frac{\partial f}{\partial x^i}(p)dx^i\rvert_p,\]
or, in "point-free" form, we may write
\[df = \frac{\partial f}{\partial x^i} dx^i.\]
The latter form reproduces the expression for the differential of a function in classical calculus. For instance, in classical calculus, the differential of a function \(f : \mathbb{R}^3 \rightarrow \mathbb{R}\) with coordinate functions denoted \(x,y,z\) is written as
\[df = \frac{\partial f}{\partial x}dx + \frac{\partial f}{\partial y}dy + \frac{\partial f}{\partial z}dz.\]
Thus, this notation is suggestive, since it coincides with the classical notation of calculus. Furthermore, it furnishes the notation with a semantics; symbols like \(dx\) are not merely notation, but signify a proper mathematical object, namely, in this example, a coordinate covector field.
In the post about tangent spaces, the differential \(df_p\) was defined as a certain linear map from \(T_pM\) to \(T_{f(p)}\mathbb{R}\). In this post, it was defined as a covector, that is, as a map from \(T_pM\) to \(\mathbb{R}\). These two definitions coincide if we consider the natural isomorphism between \(T_p\mathbb{R}\) and \(\mathbb{R}\). Recall that, for any tangent vector (that is, derivation) \(w \in T_p\mathbb{R}\), this isomorphism is witnessed (in one direction) by the map \(w \mapsto w(x^i) \cdot e_i\) for smooth coordinate functions \((x^i)\). Since there is only one coordinate function for \(\mathbb{R}\) and this unique coordinate function is just the identity map when using the standard basis, we can compute the following (where we refer to the differential of a smooth map as \(\bar{d}f_p\) and the differential of a function as \(df_p\)):
\[\bar{d}f_p(v)(\mathrm{Id}_{\mathbb{R}}) = v(\mathrm{Id}_{\mathbb{R}} \circ f) = v(f) = df_p(v).\]
Thus, the two definitions are naturally equivalent, and may be used interchangeably.
In the post about total derivatives, the total derivative of a function \(f : \mathbb{R}^n \rightarrow \mathbb{R}\) was defined as the unique linear map \(df_p\) satisfying the following:
\[f(p + v) = f(p) + df_p(v) + \varepsilon(v).\]
This means that
\[f(p + v) - f(p) \approx df_p(v)\]
and the error \(\varepsilon(v)\) decreases rapidly as \(v\) tends to \(0\), i.e. the error is very small when \(v\) is small.
Writing \(\Delta f\) for the differential, i.e. the difference
\[\Delta f = f(p + v) - f(p),\]
we then obtain
\[\Delta f \approx df_p(v).\]
Thus, the differential (viewed as a covector) may be thought of as an approximation for the differential \(\Delta f\) whenever \(v\) is sufficiently small. This explains where the term "differential" comes from.
Pullbacks of Covector Fields
Given any smooth map \(F : M \rightarrow N\) between smooth manifolds \(M\) and \(N\), we can form the dual linear map \(dF^*_p : T^*_{F(p)}N \rightarrow T^*_pM\) of the differential \(dF_p : T_pM \rightarrow T_{F(p)}N\) as follows for any \(\omega \in dF^*_p : T^*_{F(p)}N\) and \(v \in T_pM\):
\[dF^*_p(\omega)(v) = \omega_{F(p)}(dF_p(v)).\]
Note that, since we demonstrated previously that the matrix representing the dual of a linear map is the transpose of the matrix representing the linear map, it follows that the matrix representing \(dF^*_p\) is the transpose of the matrix representing \(dF_p\) (namely, the Jacobian matrix), and hence is the transpose of the Jacobian matrix.
This enables us to define the pullback \(F^*\omega : M \rightarrow T^*M\) of a covector field \(\omega : N \rightarrow T^*N\) along the smooth map \(F\):
\[(F^*\omega)_p = dF^*_p(\omega_{F(p)}) \text{, \(p \in M\)}.\]
In detail, this covector acts on a tangent vector \(v \in T_pM\) as follows:
\[(F^*\omega)_p(v) = \omega_{F(p)}(dF_p(v)).\]
Unlike vector fields, where the pushforward operation is not always defined, the pullback is always defined for any smooth map whatsoever.
Next we will consider the pullback in coordinates. Let \(p\) be any point of a smooth manifold \(M\), let \((y^j)\) be smooth coordinate functions for a neighborhood \(V\) of \(N\), and consider the action of \(F\rvert_U\) for \(U = F^{-1}(V)\) on a covector field \(\omega \in \mathfrak{X}^*(N)\), where \(\omega = \omega_j dy^j\) for coordinate functions \(\omega_j\):
\[F^*\omega = F^*(\omega_j dy^j).\]
Next, we will demonstrate a technical lemma that will allow us to further expand the right-hand side of this expression. Let \(u\) be any continuous, real-valued function on \(N\) and \(omega\) be any covector field on \(N\). Then, we compute
\begin{align}F^*(u\omega) &= dF^*_p((u\omega)_{F(p)}) \\&= dF^*_p(u(F(p))\omega_{F(p)}) \\&= u(F(p)) dF^*_p(\omega_{F(p)}) \\&= u(F(p))(F^*(\omega))_p \\&= ((u \circ F)F^*(\omega))_p .\end{align}
Applying this to the expression for the action of the pullback in coordinates, we obtain
\begin{align}F^*\omega &= F^*(\omega_j dy^j) \\&= (\omega_j \circ F)F^* dy^j. \end{align}
We next demonstrate another technical lemma. For any smooth, real-valued function \(u\) on \(N\), it follows that
\begin{align}(F^*du)_p(v) &= dF^*_p(du_{F(p)})(v) \\&= du_{F(p)}(dF_p(v)) \\&= dF_p(v)(u) \\&= v(u \circ F) \\&= d(u \circ F)_p(v).\end{align}
Applying this to the expression for the pullback in coordinates, we obtain:
\begin{align}F^*\omega &= F^*(\omega_j dy^j) \\&= (\omega_j \circ F)F^* dy^j \\&= (\omega_j \circ F) d(y_j \circ F) \\&= (\omega_j \circ F) dF^j. \end{align}