The Jacobian Matrix and the Differential
Kaplan §2.7 - the matrix of the differential, the chain rule as matrix multiplication.
Prerequisite: §2.6 Total Differential.
Here is a nonlinear map $\mathbf{f}(u, v) = (u^2 - v^2,\, 2uv)$ that takes the $(u,v)$-plane on the left to the $(x,y)$-plane on the right. We have placed a small purple square at a base point $(u_0, v_0)$, and we ask: what does its image look like?
The orange blob is the actual image of the square - it is curved, because $\mathbf{f}$ is nonlinear. The teal parallelogram is what we get if we replace $\mathbf{f}$ by its linear approximation at $(u_0, v_0)$. As you shrink $h$, the two collapse onto each other.
The teal parallelogram is spanned by two vectors. Those vectors are the columns of a matrix - the Jacobian. That is what this section is about.
From the differential to the matrix
The teal parallelogram in the hook is a linear map of a small square. Linear maps in two dimensions are $2 \times 2$ matrices. So before we touch the general case, let's just write down that matrix for the map in the hook.
For $\mathbf{f}(u, v) = (u^2 - v^2,\, 2uv)$, the differential at $(u_0, v_0)$ takes the input increment $(du, dv)$ to
That is the entire idea. Two output components, two input variables, four partial derivatives in a $2 \times 2$ grid - rows are outputs, columns are inputs. The teal parallelogram in the hook is what you get when this matrix acts on the small square.
It turns out that the same recipe works for $\mathbf{f} \colon \mathbb{R}^n \to \mathbb{R}^m$ - one row per output, one column per input. For a map with components
the differential at $\mathbf{x}_0$ is a linear map sending an input increment $d\mathbf{x}$ to an output increment $d\mathbf{y}$, with
Every linear map between finite-dimensional spaces has a matrix representation - and ours already comes with one written on its forehead. Each output component $f^i$ has its own differential
Stack the $m$ equations and you have a matrix-vector product. Define the Jacobian matrix of $\mathbf{f}$ at $\mathbf{x}_0$:
Rows index outputs. Row $i$ holds the gradient of $f^i$ - the differential of the $i$th component. Columns index inputs. Column $j$ tells you how every output responds to a wiggle in $x_j$.
The differential collapses to a single matrix-vector product:
Kaplan also writes this as $d\mathbf{y} = \mathbf{f}_\mathbf{x}\, d\mathbf{x}$ or $d\mathbf{y} = \mathbf{y}_\mathbf{x}\, d\mathbf{x}$ - the same object, different shorthand. The shape is $(m \times n) \cdot (n \times 1) = (m \times 1)$, exactly what you need.
A 2×2 worked example
Let's pin down the geometric reading of the Jacobian by reading off two specific points. Take the map $\mathbf{f}(u, v) = (u^2 - v^2,\; 2uv)$ - the same one in the hook. (If you know complex numbers: this is $z \mapsto z^2$ with $z = u + iv$, written in real coordinates.) Two inputs, two outputs, so the Jacobian is $2 \times 2$.
Compute partials column by column. Set $f^1 = u^2 - v^2$ and $f^2 = 2uv$. Then
Assemble. Rows are outputs, columns are inputs:
Now read off geometric content at specific points.
At $(u, v) = (1, 0)$:
Pure scaling by $2$. A small square near $(1, 0)$ maps (to first order) to a square at $(1, 0)$ but twice as large.
At $(u, v) = (0, 1)$: before we crank the partials, let's predict. At $z = i$, squaring sends $i \mapsto i^2 = -1$. Nearby points orbit fast as we square - the input $i$ has argument $\pi/2$, the output $-1$ has argument $\pi$, so directions get doubled. Locally, then, the map should rotate by $90^\circ$ and stretch by some factor (roughly $|2z| = 2$ here). Let's see if the Jacobian agrees.
Yep - rotation by $90^\circ$ and scaling by $2$, exactly what we predicted. (Drag the hook viz to $(0, 1)$ and watch the parallelogram tilt.)
Trap: row-vs-column convention. Some texts (especially in machine learning and differential geometry) write the Jacobian transposed, with inputs as rows. Both conventions are alive in the wild. Always verify by computing one entry: in our convention, $J_{ij} = \partial f^i / \partial x_j$ - output index first, input index second. If you are reading a paper whose chain rule looks like it composes in the wrong order, this is usually the culprit.
The Jacobian captures local stretching, rotation, and shearing all at once - whatever a linear map can do, the Jacobian encodes pointwise. That is the whole reason the matrix is useful: complicated nonlinear behavior, neighborhood by neighborhood, is just a tour through a family of $2 \times 2$ matrices.
The chain rule is a matrix product
Composition of maps becomes multiplication of their Jacobians - that one sentence is the entire payoff of this section. It turns out that this is the reason every multivariable text builds the Jacobian even before doing any serious computation. Suppose we have two maps composed:
so $\mathbf{f} \circ \mathbf{g} \colon \mathbb{R}^p \to \mathbb{R}^m$. Then the Jacobians compose by matrix multiplication:
Outer-then-inner. Outer $\mathbf{f}$ first because it acts last, evaluated at the intermediate point $\mathbf{g}(\mathbf{x})$; inner $\mathbf{g}$ second, evaluated at $\mathbf{x}$.
The matrix shapes line up exactly the way they must for composition:
The "middle dimension" $n$ - the dimension of the intermediate space where $\mathbf{g}$ lands and $\mathbf{f}$ takes off - is exactly the dimension that gets contracted by matrix multiplication. The chain rule isn't a list of partial-derivative formulas to memorize; it's one matrix product.
The component-by-component form $\frac{\partial y_i}{\partial x_k} = \sum_j \frac{\partial y_i}{\partial u_j}\frac{\partial u_j}{\partial x_k}$ is the same equation - it is just the $(i, k)$-entry of the matrix product on the right. See §2.9 for the full development with multi-step examples.
The square case: the Jacobian determinant
When inputs and outputs live in the same dimension, a single number captures the local volume change. When $m = n$, the matrix is square, and we can take its determinant. Kaplan calls this the Jacobian determinant (or just "the Jacobian", overloading the term):
It is a single scalar, computed from the matrix. And it has a clean geometric meaning. From linear algebra: $|\det A|$ is the volume scaling factor of the linear map $A$. So if a small region of input volume $dV_x$ maps to output volume $dV_y$, then locally
Look back at the hook: the orange image and the teal parallelogram have the same area to leading order, and that area is $|\det J|$ times the area of the input square.
Sign tracks orientation. It turns out that the sign of $\det J$ carries genuine physical meaning - it is parity. If $\det J > 0$, the local linear map preserves orientation (counterclockwise stays counterclockwise, right-handed frames stay right-handed). If $\det J < 0$, it flips. Run a tiny counterclockwise loop on the input side; on the output side, it traces clockwise. This is exactly why the order of $dx\,dy\,dz$ matters under a coordinate change in physics: swap two coordinates and you negate the Jacobian, which negates the volume - and physical quantities like flux and circulation are sensitive to that sign.
This is the same determinant that shows up in the change-of-variables formula for multiple integrals: $\iint_R f\,dA = \iint_{R'} f\big(\mathbf{x}(u, v)\big)\,|\det J|\,du\,dv$. See §4.6 for the integration story.
Inverse function theorem (preview)
If you can stretch, you can unstretch. That is roughly the entire content of this card.
Picture it physically: stretching local area by a factor of $2$ and then unstretching multiplies area by $2$ and then by $1/2$. Net factor $1$. Area-scalings compose by multiplication, so undoing means dividing - the reciprocal rule for determinants is just that fact written down.
Now the formal statement. Suppose $J(\mathbf{x}_0)$ is invertible as a matrix - equivalently, $\det J(\mathbf{x}_0) \neq 0$. Then $\mathbf{f}$ has a local inverse $\mathbf{f}^{-1}$ defined on a neighborhood of $\mathbf{f}(\mathbf{x}_0)$, and its Jacobian is the inverse of $\mathbf{f}$'s Jacobian:
The matrix-level derivation is one line. Apply the chain rule to $\mathbf{f}^{-1} \circ \mathbf{f} = \text{id}$; the Jacobian of the identity is $I$, so
and matrix inverses are forced. Taking determinants gives the scalar reciprocal rule we already saw geometrically:
Trap: local vs global invertibility. The word "local" is doing real work here. $\det J(\mathbf{x}_0) \neq 0$ guarantees an inverse on some neighborhood of $\mathbf{x}_0$ - not on the whole domain. The hook map $z \mapsto z^2$ is the canonical counterexample: $\det J = 4|z|^2 \neq 0$ everywhere except the origin, so the map is locally invertible at every nonzero point. Yet globally it is $2$-to-$1$ - both $z$ and $-z$ map to the same output. Locally invertible everywhere, globally not. Don't conflate the two.
This is just a glimpse. §2.12 develops the full Inverse Function Theorem with conditions, examples, and the polar-coordinate worked case.
The reference table: standard Jacobians
Three coordinate maps show up everywhere, so their Jacobians are worth committing to memory. But before we dump them in a table, let's derive one by hand - polar - so the others don't feel like magic.
Polar coordinates: $x = r\cos\theta$, $y = r\sin\theta$. Two inputs $(r, \theta)$, two outputs $(x, y)$, so $J$ is $2 \times 2$. Compute partials:
Assemble (rows are outputs, columns are inputs):
Determinant:
Numerical check. At $(r, \theta) = (2, \pi/4)$, $\det J = 2$. Sanity: in polar coordinates, the area element is $dA = r\,dr\,d\theta$ - that factor of $r$ is the local area-scaling factor between $(r, \theta)$-rectangles and $(x, y)$-pieces. They match. Basically, every time you've seen a stray $r$ in a polar integral, that was the determinant of this matrix saying hello.
With that sample done, here is the full table. Cylindrical is just polar with a passive $z$ coordinate tacked on; spherical we leave for §4.6, which works it out via cofactor expansion.
| Map | Jacobian matrix | $\det J$ |
|---|---|---|
|
Polar $(r, \theta) \mapsto (x, y)$ $x = r\cos\theta$ $y = r\sin\theta$ |
$\begin{pmatrix} \cos\theta & -r\sin\theta \\ \sin\theta & r\cos\theta \end{pmatrix}$ | $r$ |
|
Cylindrical $(r, \theta, z) \mapsto (x, y, z)$ $x = r\cos\theta$ $y = r\sin\theta$ $z = z$ |
$\begin{pmatrix} \cos\theta & -r\sin\theta & 0 \\ \sin\theta & r\cos\theta & 0 \\ 0 & 0 & 1 \end{pmatrix}$ | $r$ |
|
Spherical $(\rho, \varphi, \theta) \mapsto (x, y, z)$ $x = \rho\sin\varphi\cos\theta$ $y = \rho\sin\varphi\sin\theta$ $z = \rho\cos\varphi$ |
$\begin{pmatrix} \sin\varphi\cos\theta & \rho\cos\varphi\cos\theta & -\rho\sin\varphi\sin\theta \\ \sin\varphi\sin\theta & \rho\cos\varphi\sin\theta & \rho\sin\varphi\cos\theta \\ \cos\varphi & -\rho\sin\varphi & 0 \end{pmatrix}$ | $\rho^2 \sin\varphi$ |
These are the workhorses of multivariable integration. The polar $r$ explains why $dA = r\,dr\,d\theta$. The spherical $\rho^2 \sin\varphi$ is why volume integrals in 3-space carry that exact factor.
Cross-link to §4.6 spherical derivation for the full computation of $\rho^2\sin\varphi$ via cofactor expansion.
Practice Problems - §2.7
From Kaplan, problems after §2.7.
Row 1 (output $y_1 = 2x_1^2 + x_2^2$): $\dfrac{\partial y_1}{\partial x_1} = 4x_1$, $\dfrac{\partial y_1}{\partial x_2} = 2x_2$.
Row 2 (output $y_2 = 3x_1 x_2$): $\dfrac{\partial y_2}{\partial x_1} = 3x_2$, $\dfrac{\partial y_2}{\partial x_2} = 3x_1$.
$J(x_1, x_2) = \begin{pmatrix} 4x_1 & 2x_2 \\ 3x_2 & 3x_1 \end{pmatrix}.$
Two inputs and two outputs, so the matrix is $2 \times 2$ - shape check passes.
$J(x_1, x_2) = \begin{pmatrix} 2x_1 & 2x_2 \\ x_2 & x_1 \end{pmatrix}.$ At $(2, 1)$: $J(2, 1) = \begin{pmatrix} 4 & 2 \\ 1 & 2 \end{pmatrix}.$
$\mathbf{f}(2, 1) = (4 + 1,\; 2) = (5, 2)$.
$d\mathbf{x} = (0.04, 0.01)$. Then
$d\mathbf{y} = \begin{pmatrix} 4 & 2 \\ 1 & 2 \end{pmatrix}\begin{pmatrix} 0.04 \\ 0.01 \end{pmatrix} = \begin{pmatrix} 0.16 + 0.02 \\ 0.04 + 0.02 \end{pmatrix} = \begin{pmatrix} 0.18 \\ 0.06 \end{pmatrix}.$
So $\mathbf{f}(2.04, 1.01) \approx (5 + 0.18,\; 2 + 0.06) = (5.18,\; 2.06).$
Exact: $(2.04^2 + 1.01^2,\; 2.04 \cdot 1.01) = (4.1616 + 1.0201,\; 2.0604) = (5.1817,\; 2.0604)$. Linear estimate is right to three decimals.
$u_x = 3x^2 - 3y^2$, $u_y = -6xy$, $v_x = 6xy$, $v_y = 3x^2 - 3y^2$.
$J = \begin{pmatrix} 3x^2 - 3y^2 & -6xy \\ 6xy & 3x^2 - 3y^2 \end{pmatrix}.$
$\det J = (3x^2 - 3y^2)^2 - (-6xy)(6xy) = (3x^2 - 3y^2)^2 + 36x^2 y^2.$
$(3x^2 - 3y^2)^2 + 36x^2 y^2 = 9(x^2 - y^2)^2 + 36 x^2 y^2 = 9\big[(x^2 - y^2)^2 + 4x^2 y^2\big] = 9(x^2 + y^2)^2.$
So $\dfrac{\partial(u, v)}{\partial(x, y)} = 9(x^2 + y^2)^2.$
This is the map $z \mapsto z^3$ in real coordinates. As complex numbers $|z^3|^2 = |z|^6 = (x^2+y^2)^3$, and the local area scaling at $z$ is $|3z^2|^2 = 9|z|^4 = 9(x^2 + y^2)^2$. Check.
$u_x = e^x \cos y$, $u_y = -e^x \sin y$, $v_x = e^x \sin y$, $v_y = e^x \cos y$.
$\det J = (e^x \cos y)(e^x \cos y) - (-e^x \sin y)(e^x \sin y) = e^{2x}(\cos^2 y + \sin^2 y) = e^{2x}.$
$\det J(1, 0) = e^2 \approx 7.389.$
This is the map $z \mapsto e^z$ in real form. Local area at $(x, y)$ is scaled by $|e^z|^2 = e^{2x}$ - which depends only on $x$, never $y$, exactly as the determinant says.