Tangent Spaces
This post contains a detailed explanation of the concept of the tangent space at a point on a smooth manifold.

In the post about total derivatives, a geometric version of the tangent space was defined in order to define the total derivative as a linear approximation.
However, this definition is an extrinsic definition; it relies on the fact that the spaces \(\mathbb{R}^n\) and \(\mathbb{R}^m\) have the global structure of a vector space in order to define the tangent spaces \(T_a\mathbb{R}^n\) and \(T_{f(a)}\mathbb{R}^m\). What if we want to define the tangent space on an arbitrary smooth manifold, where the manifold may not have the global structure of a vector space? We could embed the manifold in an ambient space, but, again, this would be an extrinsic definition since it relies on some external apparatus that may not be generally available. We seek an intrinsic definition.
We think of the elements of the vector space \(T_a\mathbb{R}^n\) as tangent vectors. The key observation is this: don't ask what tangent vectors are, instead ask what they do. In mathematical language, we ask how tangent vectors act.
For any differentiable map \(f : \mathbb{R}^n \rightarrow \mathbb{R}^m\) and any vector \(v \in \mathbb{R}^n\), the image \(df_a(v) \in \mathbb{R}^m\) is the directional derivative of \(f\) at \(a\), \(D_vf(a)\), and, if we think of the total derivative as a linear map between tangent spaces, then this fact suggests a relationship between tangent vectors and directional derivatives. Given the point \(a\) and the vector \(v\), the operation \(D_v(-)(a)\) maps a smooth function to its directional derivative in the direction of \(v\) at the point \(a\). Since directional derivatives are computed component-wise, it is sufficient to define this operation only for functions \(f : \mathbb{R}^n \rightarrow \mathbb{R}\), i.e. functions \(f \in C^\infty(\mathbb{R}^n)\). The action that we seek then, is this: tangent vectors \(v\) act upon smooth functions \(f \in C^\infty(\mathbb{R}^n)\) by determining the directional derivative of \(f\) at the point \(a\), i.e. they are operators \(D_v(-)(a) : C^\infty(\mathbb{R}^n) \rightarrow \mathbb{R}\).
We may think of this characterization of tangent vectors as a sort of representation theorem: we have represented the abstract, complex directional derivative operators in terms of concrete, simple vectors.
We have succeeded in characterizing the action of tangent vectors. However, this characterization relies upon the existence of vectors \(v \in \mathbb{R}^n\), so it may seem that we haven't made any progress. However, we can abstract this definition by specifying the essential properties of such operators without making any reference to the vectors \(v \in \mathbb{R}^n\).
Properties of Directional Derivatives
Toward this end, let's establish a few important properties of directional derivatives.
Linearity
The space of smooth functions \(C^\infty(\mathbb{R}^n)\) comprise vector space over the field \(\mathbb{R}\) under point-wise addition and scalar multiplication of smooth functions:
- \((f + g)(p) = f(p) + g(p)\)
- \((sf)(p) = s(f(p))\)
The operators \(D_v(-)(a)\) are linear over \(\mathbb{R}\), which means that, for any differentiable functions \(f\) and \(g\) and scalar \(s \in \mathbb{R}\),
- \(D_v(f+g)(a) = D_vf(a) + D_vg(a)\)
- \(D_v(sf)(a) = sD_vf(a).\)
The former property can be verified as follows:
\begin{align}D_v(f+g)(a) &= \lim_{t \to 0}\frac{(f+g)(a + tv) - (f+g)(a)}{t}\\&= \lim_{t \to 0}\frac{f(a+tv)+g(a+tv)-(f(a)+g(a))}{t}\\&= \lim_{t \to 0}\frac{(f(a+tv)-f(a))+(g(a+tv)-g(a))}{t}\\&=\lim_{t \to 0}\left[\frac{f(a+tv)-f(a)}{t}+\frac{g(a+tv)-g(a)}{t}\right]\\&=\lim_{t \to 0}\frac{f(a+tv)-f(a)}{t}+\lim_{t \to 0}\frac{g(a+tv)-g(a)}{t}\\&=D_vf(a)+D_vg(a),\end{align}
while the latter property can be verified as follows:
\begin{align}D_v(sf)(a) &= \lim_{t \to 0}\frac{(sf)(a+tv) - (sf)(a)}{t}\\&= \lim_{t \to 0}\frac{s(f(a+tv)) - s(f(a))}{t}\\&= \lim_{t \to 0}\frac{s(f(a+tv) - f(a))}{t}\\&= s \cdot \lim_{t \to 0}\frac{f(a+tv) - f(a)}{t}\\&= sD_vf(a).\end{align}
Product Rule
Directional derivatives also obey the product rule for any differentiable functions \(f\) and \(g\)
\[D_v(fg)(a) = f(a)D_vg(a) + g(a)D_vf(a)\]
as can be verified as follows:
\begin{align}D_v(fg)(a) &= \lim_{t \to 0}\frac{(fg)(a+tv)-(fg)(a)}{t}\\&= \lim_{t \to 0}\frac{f(a+tv)g(a+tv)-f(a)g(a)}{t}\\&= \lim_{t \to 0}\frac{f(a+tv)g(a+tv)-f(a)g(a) + (f(a+tv)g(a) - f(a+tv)g(a))}{t}\\&=\lim_{t \to 0}\frac{f(a+tv)(g(a+tv)-g(a)) + g(a)(f(a+tv)-f(a))}{t}\\&= \left[\lim_{t \to 0}f(a+tv)\lim_{t \to 0}\frac{g(a+tv)-g(a)}{t}\right] + \left[g(a)\lim_{t \to 0}\frac{f(a+tv)-f(a)}{t}\right]\\&= f(a)D_vg(a) + g(a)D_vf(a)\end{align}
This uses the fact that \(\lim_{t \to 0}f(a+tv) = f(a)\) since \(f\) is differentiable and therefore continuous.
Derivations at a Point
These properties of the directional derivative operators are enough to completely characterize them. We arrive at the following definition:
Definition (Derivation at a Point on \(\mathbb{R}^n\)) A derivation on \(\mathbb{R}^n\) at a point \(a \in \mathbb{R}^n\) is a linear map \(w : C^\infty(\mathbb{R}^n) \rightarrow \mathbb{R}\) over \(\mathbb{R}\) that obeys the product rule for any two smooth functions \(f,g \in C^\infty(\mathbb{R}^n)\)
\[w(fg) = f(a)w(g) + g(a)w(f).\]
The set of derivations at a point \(a \in \mathbb{R}^n\) on \(\mathbb{R}^n\) is denoted \(\mathcal{D}_a(\mathbb{R}^n)\) and is a vector space over \(\mathbb{R}\) under the following operations:
- \((w_1 + w_2)(f) = w_1(f) + w_2(f)\)
- \((sw)(f) = s(w(f))\).
Thus, every directional derivative operator \(D_v(-)(a)\) satisfies these properties and is therefore a derivation at the point \(a\). In fact, the converse is also true: every derivation at a point \(w\) is equivalent to a directional derivative operator \(D_{w^ie_i}(-)(a)\) for a particular vector \(w^ie_i\) as we will now demonstrate. Thus, we have fully characterized the directional derivative operators without explicit reference to any auxiliary vector \(v\). This achieves our goal of an intrinsic definition of tangent vectors, as long as we can establish an isomorphism.
The Tangent Space Isomorphism
We want to establish that \(\mathcal{D}_a(\mathbb{R}^n) \cong T_a(\mathbb{R}^n)\). We will first show that\(\mathcal{D}_a(\mathbb{R}^n) \cong \mathbb{R}^n\).
Thus, we need to define a mapping \(T : \mathcal{D}_a\mathbb{R}^n \rightarrow \mathbb{R}^n\) that maps a derivation \(w\) at a point \(a\) to a vector in \(\mathbb{R}^n\). Recall that
\[D_v(x^j)(a)e_j = v^i\frac{\partial x^j}{\partial x^i}(a)e_j = v^i\delta_i^je_j = v^je_j = v.\]
Thus, we will define the mapping \(T\) as
\[T(w) = w(x^j)e_j\].
We often denote \(w(x^i)\) by \(w^i\). We also need to define the inverse mapping \(T^{-1} : \mathbb{R}^n\rightarrow \mathcal{D}_a\mathbb{R}^n\). Note that, since partial derivatives are directional derivatives, the operators
\[\frac{\partial}{\partial x^i}\bigg\rvert_a\]
defined so that
\[\frac{\partial}{\partial x^i}\bigg\rvert_a(f) = \frac{\partial f}{\partial x^i}(a)\]
are also derivations. Since, for any directional derivative we have
\[D_v(f)(a) = v^i \frac{\partial f}{\partial x^i}(a),\]
we therefore define the mapping as
\[T^{-1}(v) = D_v(-)(a) = v^i \frac{\partial}{\partial x^i}\bigg\rvert_a.\]
Our goal is to demonstrate that these mappings are mutually inverse, i.e. that \(T(T^{-1}(v)) = v\) and \(T^{-1}(T(w)) = w\). This means that we want to demonstrate that
\[v = D_v(x^j)(a)e_j = v^i\frac{\partial}{\partial x^i}\bigg\rvert_a(x^j)e_j\]
and
\[w = D_{w^ie_i}(-)(a) = w^i \frac{\partial}{\partial x^i}\bigg\rvert_a.\]
We have already demonstrated the first equality. The second is more difficult to establish.
First, we demonstrate a technical lemma.
Lemma 1 For any derivation \(w \in \mathcal{D}_a(\mathbb{R}^n)\) at a point \(a \in \mathbb{R}^n\) and constant function \(k : \mathbb{R}^n \rightarrow \mathbb{R}\), \(w(k) = 0\).
Proof. First, note that, for the constant function \(1 : \mathbb{R}^n \rightarrow \mathbb{R}\) defined such that \(1(v) = v\) for all \(v \in \mathbb{R}^n\),
\[w(1) = w(1 \cdot 1) = 1(a)w(1) + 1(a)w(1) = 2w(1)\]
which implies that \(w(1) = 0\). Then, for an arbitrary constant function \(k : \mathbb{R}^n \rightarrow \mathbb{R}\) defined such that \(k(v) = c\) for all \(v \in \mathbb{R}^n\),
\[w(k) = w(k \cdot 1) = w(c \cdot 1) = c \cdot w(1) = c \cdot 0 = 0.\]
There is very little that we can say about the action \(w(f)\) of an arbitrary derivation \(w\) on \(\mathbb{R}^n\) at a point \(a \in \mathbb{R}^n\) upon a smooth function \(f \in C^\infty(\mathbb{R}^n)\). However, if we can express \(f\) in terms of linear combinations and products of smooth functions, then we can analyze the action of \(w\) on \(f\).
Most of the operations in calculus are linear operations. For any smooth function \(F : \mathbb{R} \rightarrow \mathbb{R}\), by the Fundamental Theorem of Calculus, we have that
\[F(1) = F(0) + \int_0^1 \frac{d}{dt}F(t)dt.\]
A smooth function \(f \in C^\infty(\mathbb{R}^n)\) can be parameterized by a smooth function \(p : \mathbb{R} \rightarrow \mathbb{R}^n\) to obtain a composite function \(F = f \circ p : \mathbb{R} \rightarrow \mathbb{R}\). We want \(F(0) = f(a)\) and \(F(1) = f(x)\) so that
\[f(x) = f(a) + \int_0^1 \frac{d}{dt}F(t)dt.\]
The parameterization
\[p(t) = a + t(x - a)\]
is a smooth function that parameterizes the path from \(a\) to \(x\), i.e. \(p(0) = a\) and \(p(1) = x\).
We thus have
\[f(x) = f(a) + \int_0^1 \frac{d}{dt}f(a + t(x-a))dt.\]
By the chain rule,
\[\frac{d}{dt}f(a + t(x-a)) = \sum_{i=1}^n\left[\frac{\partial f}{\partial x^i}(a + t(x-a)) \cdot (x^i - a^i)\right].\]
Thus, we have
\[f(x) = f(a) + \int_0^1 \sum_{i=1}^n\left[\frac{\partial f}{\partial x^i}(a + t(x-a)) \cdot (x^i - a^i)\right]dt.\]
Since the integral operator is linear, it follows that
\[f(x) = f(a) + \sum_{i=1}^n \int_0^1 \left[\frac{\partial f}{\partial x^i}(a + t(x-a)) \cdot (x^i - a^i)\right]dt.\]
Since the expression \(x^i - a^i\) is not a function of \(t\), it follows that
\[f(x) = f(a) + \sum_{i=1}^n (x^i - a^i) \int_0^1 \left[\frac{\partial f}{\partial x^i}(a + t(x-a)) \right]dt.\]
Let's write
\[g^i(x) = \int_0^1 \left[\frac{\partial f}{\partial x^i}(a + t(x-a))\right]dt,\]
\[h^i(x) = x^i - a^i,\]
and the constant function
\[k(x) = f(a).\]
Then, we have that
\[f(x) = k(x) + \sum_{i=1}^n h^i(x)g^i(x)\]
which means
\[f = k + \sum_{i=1}^n h^ig^i.\]
Thus, we have finally found the expression of the smooth function\(f\) in terms of linear combinations and products of smooth functions. We can now analyze the action of \(w\) on \(f\).
\begin{align}w(f) &= w(k + \sum_{i=1}^n h^ig^i)\\&= w(k) + \sum_{i=1}^n w(h^ig^i)\\&= w(k) + \sum_{i=1}^n (h^i(a)w(g^i)+g^i(a)w(h^i))\\&= \sum_{i=1}^n \left[(a^i-a^i)w(g^i) + \left(\int_0^1 \frac{\partial f}{\partial x^i}(a + t(a - a))dt\right)w(x^i-a^i)\right]\\&= \sum_{i=1}^n \left(\int_0^1 \frac{\partial f}{\partial x^i}(a) dt\right)(w(x^i)-w(a^i))\\&= \sum_{i=1}^n w(x^i)\frac{\partial f}{\partial x^i}(a) \end{align}
Thus, it follows that
\[w = w^i\frac{\partial}{\partial x^i}\bigg\rvert_a = D_{w^ie_i}(-)(a)\]
where \(w^i = w(x^i)\) are the components of \(w\). The derivations \(\partial/\partial x^i\rvert_a\) therefore comprise a spanning set for \(\mathcal{D}_a\mathbb{R}^n\). They also form a basis, since if
\[w^i\frac{\partial}{\partial x^i}\bigg\rvert_a = 0,\]
where \(0\) is the zero derivation at the point \(a\) defined such that \(0(f) = 0\) for all \(f \in C^\infty(\mathbb{R}^n)\) then, applying this to any coordinate function \(x^j\), we obtain
\[w^i\frac{\partial}{\partial x^i}\bigg\rvert_a(x^j) = 0(x^j) = 0\],
which further implies
\begin{align}0 &= w^i\frac{\partial}{\partial x^i}\bigg\rvert_a(x^j)\\&= w^i\frac{\partial x^j}{\partial x^i}(a)\\&= w^i\delta^j_i\\&= w^i.\end{align}
Another way to confirm that these tangent vectors form a basis is to note that they are the images of the basis vectors \((e^i)\) under the mapping of the isomorphism, which sends \(e^i \mapsto D_{e^i}(-)(a) = \partial/\partial x^i\rvert_a\).
It follows that \(\mathrm{dim}(\mathcal{D}_a\mathbb{R}) =\mathrm{dim}(\mathbb{R}^n) = n\).
Thus, we have established that every derivation \(w\) at a point is equivalent to a directional derivative operator.
It follows that \(\mathcal{D}_a\mathbb{R}^n \cong \mathbb{R}^n\).
Now, composing this isomorphism with the isomorphism \(\mathbb{R}^n \cong T_a\mathbb{R}^n\), we obtain a final isomorphism \(\mathcal{D}_a\mathbb{R}^n \cong T_a\mathbb{R}^n\).
This means that we can now define the tangent space \(T_a\mathbb{R}^n\) to be the vector space of derivations on \(\mathbb{R}^n\) at the point \(a \in \mathbb{R}^n\).
The mapping \(v \mapsto D_v(-)(a)\) is thus a canonical isomorphism. We used a choice of basis (and hence of coordinate functions) in the proof, but the isomorphism itself is defined without any reference to a basis. This means that we can identify the two spaces. From now on, the tangent spaces will refer to the space of derivations at a point.
The Differential on Euclidean Spaces
Now that we've defined the tangent spaces, we can define the differential \(\bar{d}F_a : T_a\mathbb{R}^n \rightarrow T_{F(a)}\mathbb{R}^m\) of a smooth function \(F : \mathbb{R}^n \rightarrow \mathbb{R}^m\) as follows for any tangent vector \(v \in T_a\mathbb{R}^n\) and smooth function \(f \in C^\infty(\mathbb{R}^n)\):
\[dF_a(v)(f) = v(f \circ F).\]
We want to verify that the map \(dF_a\) is isomorphic to the total derivative \(dF_a\), i.e. that the diagram
\begin{CD} T_a\mathbb{R}^n @>\bar{d}F_a>> T_{F(a)}\mathbb{R}^m\\ @AAT_a^{-1}A @VVT_{F(a)}V \\ \mathbb{R}^n @>dF_a>> \mathbb{R}^m \end{CD}
commutes, where \(T_a^{-1} : \mathbb{R}^n \rightarrow T_a\mathbb{R}^n\) is the map \(T_a^{-1}(v) = v^i \partial/\partial x^i\rvert_a\) and \(T_a : T_{F(a)}\mathbb{R}^m \rightarrow \mathbb{R}^m\) is the map \(T_{F(a)}(w) = w(x^j)e_j\). For any \(v \in \mathbb{R}^n\), using the standard basis vectors \((e_i)\) for \(\mathbb{R}^n\) with coordinate projection functions \(x^i\) and the standard basis vectors \((e_j)\) for \(\mathbb{R}^m\) with coordinate projection functions \(y^j\), we compute
\begin{align}T_{F(a)}(\bar{d}F_a(T_a^{-1}(v))) &= \bar{d}F_a(v^i \frac{\partial}{\partial x^i}\bigg\rvert_a)(y^j)e_j\\&= v^i \frac{\partial}{\partial x^i}\bigg\rvert_a(y^j \circ F)e_j\\&= v^i \frac{\partial F^j}{\partial x^i}(a)e_j\\&= dF_a(v).\end{align}
Thus, the total derivative and the differential are canonically isomorphic and we can use the differential in place of the total derivative. We thus write \(dF_a\) to indicate the differential.
The above calculation also indicates that the differential is represented by the Jacobian matrix
\[\left(\frac{\partial F^j}{\partial x^i}(a)\right)\]
with row \(j\) and column \(i\).
Tangent Spaces on Manifolds
Now that we have a definition of tangent spaces that doesn't rely on the vector space structure of \(\mathbb{R^n}\), we can generalize the tangent space to an arbitrary manifold.
Definition (Derivation at a Point \(p\) on a Manifold) For any smooth manifold \(M\), a derivation on \(M\) at the point \(p \in M\) is a linear map \(v : C^\infty(\mathbb{R}^n) \rightarrow \mathbb{R}\) over \(\mathbb{R}\) satisfying the the product rule
\[v(fg) = f(p)v(g) + g(p)v(f)\]
for all \(f,g \in C^\infty(M)\).
Each such derivation on \(M\) at a point \(p \in M\) is called a tangent vector at \(p\). The set of all tangent vectors at a point \(p\) is called the tangent space at \(p\) and is denoted \(T_pM\).
Likewise, given a smooth map \(F : M \rightarrow N\) between smooth manifolds \(M\) and \(N\), for each \(p \in M\), the differential of \(F\) at \(p\) is the map \(dF_p : T_pM \rightarrow T_{F(p)}N\) defined, for any \(v \in T_pM\) and \(f \in C^\infty(M)\), as
\[dF_p(v)(f) = v(f \circ F).\]
Thus, it is straightforward to generalize the notions of tangent space and differential to smooth manifolds.
Properties of the Differential
Let's establish a few useful properties about the differential.
Property 1 (Linearity) \(dF_p\) is linear.
Proof. For any \(v,w \in T_pM\) and \(f \in C^\infty(N)\),
\[dF_p(v+w)(f) = (v+w)(f \circ F) = v(f \circ F)+w(f \circ F) = dF_p(w)+dF_p(w).\]
Property 2 (Chain Rule) For any smooth maps \(F : M \rightarrow N\) and \(G : N \rightarrow P\) among smooth manifolds \(M,N,P\), \(d(G \circ F)_p = dG_{F(p)} \circ dF_p\).
Proof. For any tangent vector \(v \in T_pM\) and smooth function \(f \in C^\infty(P)\), we compute
\begin{align}d(G \circ F)_p(v)(f) &= v(f \circ G \circ F)\\ &= dF_p(v)(f \circ G)\\&= dG_{F(p)}(dF_p(v))(f)\\&= (dG_{F(p)} \circ dF_p)(f)\end{align}
Property 3 For the identity function \(\mathrm{Id}_M\) on a smooth manifold \(M\), \(d(\mathrm{Id}_M) = \mathrm{Id}_{T_pM}\).
Proof. For any smooth function \(f \in C^\infty(M)\) and tangent vector \(v \in T_pM\),
\[d(\mathrm{Id}_M)_p(v)(f) = v(f \circ \mathrm{Id}_M) = v(f),\]
which implies that \(d(\mathrm{Id}_M)_p(v) = v = \mathrm{Id}_{T_pM}(v)\).
Property 4 If a smooth map \(F : M \rightarrow N\) between smooth manifolds is a diffeomorphism, then \(dF_p : T_pM \rightarrow T_{F(p)}N\) is an isomorphism, and \((dF_p)^{-1} = d(F^{-1})_{F(p)}.\)
Proof. We compute
\[dF_p \circ d(F^{-1})_{F(p)} = d(F \circ F^{-1})_{F(p)} = d(\mathrm{Id}_N)_{F(p)} = \mathrm{Id}_{T_pN}\]
and
\[d(F^{-1})_{F(p)} \circ dF_p = d(F^{-1} \circ F)_p = d(\mathrm{Id}_M)_p = \mathrm{Id}_{T_pM}.\]
Tangent Spaces on Open Subsets
When working with manifolds, we use coordinate charts, which are typically only defined on some open subset \(U \subseteq M\). We need a way to relate the tangent space \(T_pU\) to \(T_pM\). Our goal is to demonstrate that these two spaces are naturally isomorphic. First, we introduce a few technical lemmas.
Lemma 2 For any smooth manifold \(M\) and point \(p \in M\), \(v(0) = 0\) for all \(v \in T_pM\).
Proof. For any \(f \in C^\infty(M)\), \(v(f) = v(0 + f) = v(f) + v(0)\), and thus \(v(0) = 0\).
Lemma 3 If \(f\rvert_U = 0\) for some smooth function \(f \in C^\infty(M)\) and open neighborhood \(U\) of a point \(p \in M\), then \(v(f) = 0\) for all \(v \in T_pM\).
Proof. Let \(A\) be any neighborhood of \(p\) such that \(\bar{A} \subseteq U\). Let \(\beta\) be any smooth bump function for \(\bar{A}\) supported in \(U\) (bump functions will be discussed in another post). Then \(f\beta = 0\), and so \(v(f\beta) = 0\) by Lemma 1. By the product rule, \(0 = v(f\beta) = f(p)v(\beta) + \beta(p)v(f)\), and so \(\beta(p)v(f) = v(f) = 0\) since \(f(p) = 0\) and \(\beta(p) = 1\).
Note that manifolds are locally compact Hausdorff spaces, and in such spaces we can always find a pre-compact neighborhood \(A\) of \(p\) such that \(\bar{A} \subseteq U\).
Lemma 4 For any smooth manifold \(M\), point \(p \in M\), and tangent vector \(v \in T_pM\), if \(f,g \in C^\infty(M)\) agree on a neighborhood \(U\) of \(p\) (i.e. \(f\rvert_U = g\rvert_U\)), then \(v(f) = v(g)\).
Proof. Since \((f-g)\rvert_U=0\), \(v(f-g) = 0\) by Lemma 3. By linearity, \(0 = v(f-g) = v(f) - v(g)\) and thus \(v(f) = v(g)\).
Thus, the action of a tangent vector at a point on a smooth function depends only on the values of the smooth function in an arbitrarily small neighborhood of the point.
Theorem 1 For any smooth manifold \(M\) and open subset \(U \subseteq M\), \(T_pU \cong T_pM\) for every point \(p \in M\).
Proof. We can construct a mapping \(w \mapsto \tilde{w} : T_pU \rightarrow T_pM\), where for any \(w \in T_pU\) and \(f \in C^\infty(M)\),
\[\tilde{w}(f) = w(f\rvert_U).\]
In fact, observe that this is just the differential \(d\iota_p\), where \(\iota : U \rightarrow M\) is the inclusion map, since \(d\iota_p(w)(f) = w(f \circ \iota) = w(f\rvert_U) = \tilde{w}(f)\).
We first show that the mapping \(w \mapsto \tilde{w}\) is injective. Suppose \(\tilde{w} = 0\) and let \(f \in C^\infty(U)\). Let \(A\) be any neighborhood of \(p\) such that \(\bar{A} \subseteq U\). Then \(f\) extends to a smooth function \(f^+ \in C^\infty(M)\) such that \(f^+\rvert_{\bar{A}} = f\) by the Extension Lemma (which will be discussed in another post). Then \(f^+\rvert_U\) and \(f\) agree on the neighborhood \(A\) of \(p\), and so \(w(f) = w(f^+\rvert_U) = \tilde{w}(f^+) = 0\) by Lemma 4. Since \(f\) was arbitrary, \(w = 0\), and the mapping \(w \mapsto \tilde{w}\) is injective.
Next, we show that the mapping \(w \mapsto \tilde{w}\) is surjective. For any \(v \in T_pM\), we want to exhibit a \(w \in T_pU\) such that \(\tilde{w} = v\). We define \(w\) so that \(w(f) = v(f^+)\) as above for an appropriate extension \(f^+\) (note that, for any two extensions, by Lemma 4, \(w(f)\) has the same value, so \(w\) is well-defined). Then, for any \(f \in C^\infty(M)\), \(\tilde{w}(f) = w(f\rvert_U) = v(f\rvert_U^+) = v(f)\) since \(v(f\rvert_U^+) = v(f)\) by Lemma 4. Thus, since \(f\) was arbitrary, \(\tilde{w} = v\) and the mapping \(w \mapsto \tilde{w}\) is surjective.
Since the isomorphism \(d\iota_p\) is canonical (it requires no arbitrary choice of basis), we may identify the tangent spaces \(T_pU\) and \(T_pM\).
Dimensionality of the Tangent Space
We can also establish that the tangent space is finite-dimensional and has the same dimension as the underlying manifold. For any smooth coordinate chart \((U,\varphi)\), since \(\varphi : U \rightarrow \widehat{U}\) is a diffeomorphism from \(U\) to some open subset \(\widehat{U} \subseteq \mathbb{R}^n\), by Property 4, it follows that \(d\varphi_p\) is an isomorphism from \(T_pU\) to \(T_{\varphi(u)}\widehat{U}\). By Theorem 1, \(T_pU \cong T_pM\) and \(T_{\varphi(p)}\widehat{U} \cong T_{\varphi(p)}\mathbb{R}^n\). Composing each of these isomorphisms, we conclude that \(T_pM \cong T_p\mathbb{R}^n\). Thus, \(\mathrm{dim}(T_pM) = \mathrm{dim}(T_{\varphi(p)}\mathbb{R}^n) = n\).
The Coordinate Basis
For the manifold \(\mathbb{R}^n\), we have shown how to represent tangent vectors in the tangent space \(T_p\mathbb{R}^n\) at a point \(p \in \mathbb{R}^n\) in terms of the basis vectors \(\partial/\partial x^i\rvert_p\) for the standard coordinate functions \(x^i\) on \(\mathbb{R}^n\), namely, for any tangent vector \(v \in T_p\mathbb{R}^n\) with components \(v^i = v(x^i)\),
\[v = v^i\frac{\partial}{\partial x^i}\bigg\rvert_p.\]
Our goal is to represent tangent vectors \(v \in T_pM\) on an arbitrary smooth manifold \(M\) in a similar manner.
For a smooth manifold \(M\) of dimension \(n\) with coordinate chart \((U,\varphi)\), \(\varphi\) is a diffeomorphism from \(U \subseteq M\) to \(\widehat{U} \subseteq \mathbb{R}^n\). Then, by Property 4, the differential \(d\varphi_p : T_pU \rightarrow T_{\varphi(p)}\widehat{U}\) is an isomorphism. By Theorem 1, the differential \(d\iota_p : T_pU \rightarrow T_pM\) is an isomorphism where \(\iota : U \hookrightarrow M\) is the inclusion map, and thus \((d\iota_p)^{-1} : T_pM \rightarrow T_pU\) is also an isomorphism. Using Property 4, \((d\iota_p)^{-1} = d(\iota^{-1})_p\). By Theorem 1, \(d\hat{\iota}_{\varphi(p)} : T_{\varphi(p)}\widehat{U} \rightarrow T_{\varphi(p)}\mathbb{R}^n\) is an isomorphism, where \(\hat{\iota} : \widehat{U} \hookrightarrow \mathbb{R}^n\) is the inclusion map. Putting these isomorphisms together, we conclude that \(d\hat{\iota}_{\varphi(p)} \circ d\varphi_p \circ d(\iota^{-1})_p\) is an isomorphism from \(T_pM\) to \(T_p\mathbb{R}^n\).
We write \(d\varphi_p : T_pM \rightarrow \mathbb{R}^n\) for this composite isomorphism, conflating notation with the map \(d\varphi_p : T_pU \rightarrow T_p\widehat{U}\).
For any linear isomorphism between vector spaces, the image of a basis vector is a basis vector. Thus, we can define basis vectors for \(T_pM\) as the images under the isomorphism \((d\varphi_p)^{-1}\) of basis vectors in \(T_p\mathbb{R}^n\) as follows:
\[\frac{\partial}{\partial x^i}\bigg\rvert_p = (d\varphi_p)^{-1}\left(\frac{\partial}{\partial x^i}\bigg\rvert_{\varphi(p)}\right).\]
Note that we use a similar notation for these coordinate basis vectors. Consider the action of a coordinate basis vector \(\partial/\partial x^i\rvert_p \in T_pM\) on a smooth function \(f \in C^\infty(U)\) (which makes sense, even though it is only defined on \(U \subseteq M\), by Theorem 1):
\begin{align}\frac{\partial}{\partial x^i}\bigg\rvert_p(f) &= (d\varphi_p)^{-1}\left(\frac{\partial}{\partial x^i}\bigg\rvert_{\varphi(p)}\right)(f)\\&= d(\varphi_p^{-1})\left(\frac{\partial}{\partial x^i}\bigg\rvert_{\varphi(p)}\right)(f)\\&= \frac{\partial}{\partial x^i}\bigg\rvert_{\varphi(p)}(f \circ \varphi^{-1})\\&= \frac{\partial \hat{f}}{\partial x^i}(\hat{p}).\end{align}
We use the notation \(\hat{f} : U \rightarrow \mathbb{R} = f \circ \phi^{-1}\) for the coordinate representation of \(f\) and \(\hat{p} = \phi(p)\) for the coordinate representation of \(p\).
Thus, the coordinate basis vectors act by computing the \(i\)-th partial derivative of the coordinate representation of a function at the coordinate representation of \(p\).
The Differential in Coordinates
Earlier, we demonstrated that the differential \(dF_p : T_p\mathbb{R}^n \rightarrow \mathbb{R}^m\) and total derivative \(dF_p : \mathbb{R}^n \rightarrow \mathbb{R}^m\) are isomorphic for smooth maps \(F: \mathbb{R}^n \rightarrow \mathbb{R}^m\), and so we can unambiguously denote both by \(dF_p\).
Now we consider the case of a smooth map \(F : U \rightarrow V\) between open subsets \(U \subseteq \mathbb{R}^n\) and \(V \subseteq \mathbb{R}^m\) and we determine the action of the differential \(dF_p\) for any point \(p \in U\) in terms of the standard coordinate bases. We denote the coordinates in the domain by \(x^i\) and the coordinates in the codomain by \(y^j\).
Since we previously computed
\[dF_p(v) = v^i\frac{\partial F^j}{\partial x^i}(a)e_j\]
when considering \(dF_p\) as a map from \(\mathbb{R}^n\) to \(\mathbb{R}^m\), then, applying the isomorphism \(v \mapsto v^i\partial/\partial y^j\rvert_{F(p)}\) from \(\mathbb{R}^m\) to \(T_{F(p)}\mathbb{R}^m\), we obtain
\[dF_p(v) = v^i\frac{\partial F^j}{\partial x^i}(p)\frac{\partial}{\partial y^j}\bigg\rvert_{F(p)}\]
when considering \(dF_p\) as a map from \(T_p\mathbb{R}^n\) to \(T_{F(p)}\mathbb{R}^m\).
Another way to see this is to apply \(dF_p : T_p\mathbb{R}^n \rightarrow T_{F(p)}\mathbb{R}^m\) to an arbitrary vector \(v\) expressed in terms of the coordinate basis \(\partial/\partial x^i\rvert_p\) and compute using the chain rule:
\begin{align}dF_p\left(v^i\frac{\partial}{\partial x^i}\bigg\rvert_p\right)(f) &= \frac{\partial}{\partial x^i}\bigg\rvert_p(f \circ F)\\&= \frac{\partial f}{\partial y^j}(F(p))\frac{\partial F^j}{\partial x^i}(p)\\&= \left(\frac{\partial F^j}{\partial x^i}(p)\frac{\partial}{\partial y^j}\bigg\rvert_{F(p)}\right)(f).\end{align}
Next, we consider the general case of a smooth map \(F : M \rightarrow N \) between smooth manifolds \(M\) and \(N\) with coordinate chart \((U,\varphi)\) in \(M\) and \((V,\psi)\) in \(N\), where \(U\) is a neighborhood of a point \(p \in M\) and \(V\) is a neighborhood of \(F(p)\). The coordinate representation of \(F\) is the map \(\widehat{F} : \varphi(U \cap F^{-1}(V)) \rightarrow \psi(V)\) defined as \(\widehat{F} = \psi \circ F \circ \varphi^{-1}\). We then compute the action of \(dF_p\) on any basis vector \(\partial/\partial x^i\rvert_p\) as follows:
\begin{align}dF_p\left(\frac{\partial}{\partial x^i}\bigg\rvert_p\right) &= dF_p\left(d(\varphi^{-1})_{\hat{p}}\left(\frac{\partial}{\partial x^i}\bigg\rvert_{\hat{p}}\right)\right)\\&= d(F \circ \varphi^{-1})_{\hat{p}}\left(\frac{\partial}{\partial x^i}\bigg\rvert_{\hat{p}}\right)\\&= d(\psi^{-1} \circ \widehat{F})_{\hat{p}}\left(\frac{\partial}{\partial x^i}\bigg\rvert_{\hat{p}}\right)\\&= d(\psi^{-1})_{\widehat{F}(\hat{p})}\left(d\widehat{F}_{\hat{p}}\left(\frac{\partial}{\partial x^i}\bigg\rvert_{\hat{p}}\right)\right)\\&= d(\psi^{-1})_{\widehat{F}(\hat{p})}\left(\frac{\partial \widehat{F}^j}{\partial x^i}(\hat{p})\frac{\partial}{\partial y^j}\bigg\rvert_{\widehat{F}(\hat{p})}\right)\\&= \frac{\partial \widehat{F}^j}{\partial x^i}(\hat{p}) d(\psi^{-1})_{\widehat{F}(\hat{p})}\left(\frac{\partial}{\partial y^j}\bigg\rvert_{\widehat{F}(\hat{p})}\right)\\&= \frac{\partial \widehat{F}^j}{\partial x^i}(\hat{p}) \frac{\partial}{\partial y^j}\bigg\rvert_{F(p)}.\end{align}
Thus, the action of a smooth map in coordinates is represented by the Jacobian matrix of the coordinate representation of the smooth map, i.e. by the matrix
\[\left(\frac{\partial \widehat{F}^j}{\partial x^i}(\hat{p}) \right)\]
with row \(j\) and column \(i\).
Change of Coordinates
For a smooth manifold \(M\) with two charts \(U,\varphi\) and \(V,\psi\), a tangent vector \(v \in T_pM\) can be represented in the basis \((\partial/\partial x^i\rvert_p)\) or the basis \((\partial/\partial y^j)\) for any point \(p \in U \cap V\), where the coordinate functions corresponding to \(\varphi\) are \((x^i)\) and the coordinate functions corresponding to \(\psi\) are \((y^j)\).
Our goal is to transform the coordinates from one basis to another. First, recall the expression for the differential in coordinates:
\[dF_p\left(\frac{\partial}{\partial x^i}\bigg\rvert_p\right) = \frac{\partial F^j}{\partial x^i}(p)\frac{\partial}{\partial y^j}\bigg\rvert_{F(p)}.\]
Writing \((\tilde{x}^j)\) for the components of the transition map \(\psi \circ \varphi^{-1} : \varphi(U \cap V) \rightarrow \psi(U \cap V)\), the expression for the differential of the transition map in coordinates is
\[d(\psi \circ \varphi^{-1})_{\phi(p)}\left(\frac{\partial}{\partial x^i}\bigg\rvert_{\phi(p)}\right) = \frac{\partial \tilde{x}^j}{\partial x^i}(\varphi(p))\frac{\partial}{\partial y^j}\bigg\rvert_{\psi(p)}.\]
Next, we compute an alternative expression for the basis vector \(\partial/\partial x^i\rvert_p\):
\begin{align}\frac{\partial}{\partial x^i}\bigg\rvert_p &= d(\varphi^{-1})_{\phi(p)}\left(\frac{\partial}{\partial x^i}\bigg\rvert_{\phi(p)}\right)\\&= d(\psi^{-1} \circ \psi \circ \varphi^{-1})_{\phi(p)}\left(\frac{\partial}{\partial x^i}\bigg\rvert_{\phi(p)}\right)\\&= d(\psi^{-1})_{\psi(p)}\left(d(\psi \circ \varphi^{-1})_{\phi(p)}\left(\frac{\partial}{\partial x^i}\bigg\rvert_{\phi(p)}\right)\right)\\&= d(\psi^{-1})_{\psi(p)}\left(\frac{\partial \tilde{x}^j}{\partial x^i}(\varphi(p))\frac{\partial}{\partial y^j}\bigg\rvert_{\psi(p)}\right)\\&= \frac{\partial \tilde{x}^j}{\partial x^i}(\hat{p})\frac{\partial}{\partial y^j}\bigg\rvert_p.\end{align}
Finally, for any tangent vector \(v \in T_pM\) represented as
\[v = v^i \frac{\partial}{\partial x^i}\bigg\rvert_p,\]
applying the previous equation yields
\[v = v^i \frac{\partial \tilde{x}^j}{\partial x^i}(\hat{p})\frac{\partial}{\partial y^j}\bigg\rvert_p,\]
which expresses \(v\) in the other basis.
Examples
Example 1 We can recover all of the chain rules from scalar calculus, multivariable calculus, and vector calculus using the chain rule for the differential.
First, consider the case of functions \(f, g: \mathbb{R} \rightarrow \mathbb{R}\).
The basis vector in \(T_p\mathbb{R}\) is denoted
\[\frac{d}{dx}\bigg\rvert_p.\]
The coordinate function in \(\mathbb{R}\) is simply the identity \(\mathrm{Id}_{\mathbb{R}}\).
We then compute
\begin{align}d(g \circ f)_p\left(\frac{d}{dx}\bigg\rvert_p\right)(\mathrm{Id}_{\mathbb{R}}) &= (dg_{f(p)} \circ df_p)\left(\frac{d}{dx}\bigg\rvert_p\right)(\mathrm{Id}_{\mathbb{R}})\\&= dg_{f(p)}\left(df_p\left(\frac{d}{dx}\bigg\rvert_p\right)\right)(\mathrm{Id}_{\mathbb{R}})\\&= df_p\left(\frac{d}{dx}\bigg\rvert_p\right)(\mathrm{Id}_{\mathbb{R}} \circ g)\\&= df_p\left(\frac{d}{dx}\bigg\rvert_p\right)(\mathrm{Id}_{\mathbb{R}})\frac{d}{dx}\bigg\rvert_{f(p)}(g)\\&= \frac{d}{dx}\bigg\rvert_p(\mathrm{Id}_{\mathbb{R}} \circ f)g'(f(p))\\&= f'(p)g'(f(p)).\end{align}
Example 2 For another example of the chain rule, consider the chain rule for partial derivatives for a smooth map \(F : \mathbb{R}^n \rightarrow \mathbb{R}^m\) and a smooth function \(f : \mathbb{R}^m \rightarrow \mathbb{R}\):
\begin{align}\frac{\partial}{\partial x^i}\bigg\rvert_p(f \circ F) &= dF_p\left(\frac{\partial}{\partial x^i}\bigg\rvert_p\right)(f)\\&= dF_p\left(\frac{\partial}{\partial x^i}\bigg\rvert_p\right)(y^j)\frac{\partial}{\partial y^j}\bigg\rvert_{F(p)}(f)\\ &= \frac{\partial}{\partial x^i}\bigg\rvert_p(y^j \circ F)\frac{\partial f}{\partial y^j}(F(p))\\&= \frac{\partial F^j}{\partial x^i}(p) \frac{\partial f}{\partial y^j}(F(p)).\end{align}
Example 3 Consider the chain rule for a smooth map \(G : \mathbb{R}^n \rightarrow \mathbb{R}\) and a smooth function \(f : \mathbb{R} \rightarrow \mathbb{R}^n\):
\begin{align}d(G \circ f)_p\left(\frac{\partial}{\partial x^i}\bigg\rvert_p\right)(\mathrm{Id}_\mathbb{R})\\ &= dG_{f(p)}\left(df_p\left(\frac{\partial}{\partial x^i}\bigg\rvert_p\right)\right)(\mathrm{Id}_{\mathbb{R}}) \\&= df_p\left(\frac{\partial}{\partial x^i}\bigg\rvert_p\right)(\mathrm{Id}_{\mathbb{R}} \circ G)\\&= df_p\left(\frac{\partial}{\partial x^i}\bigg\rvert_p\right)(y^j)\frac{\partial}{\partial y^j}\bigg\rvert_{f(p)}(G)\\&= (f^j)'(p)\frac{\partial G}{\partial y^j}(f(p)).\end{align}