The General Chain Rule

Here's a grid of points in the $(x_1, x_2)$-plane. We're going to feed it through two transformations, one after the other, and watch what happens to the grid at each stage.

$(x_1, x_2)$-space

$(u_1, u_2)$-space - after $\mathbf{g}$

$(y_1, y_2)$-space - after $\mathbf{f} \circ \mathbf{g}$

When one transformation feeds into another, the chain rule says their Jacobian matrices multiply. But what does that look like geometrically - and why should matrix multiplication capture it?

From single to multi-variable chain rules

Let's start with something we know cold. One input, one output, one intermediate variable. If $y = f(u)$ and $u = g(x)$, single-variable calculus gives us:

$$\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}$$

Multiply two derivatives. Done. Now here's the question that matters: what happens when everything becomes vector-valued?

Say $y$ depends on two intermediate variables $(u_1, u_2)$, and each of those depends on two inputs $(x_1, x_2)$. If we nudge $x_1$ by a tiny amount, how does $y$ respond? Well, that nudge ripples through both intermediate variables simultaneously - it changes $u_1$ and $u_2$, and each of those changes affects $y$. Adding up both pathways:

$$\frac{\partial y}{\partial x_1} = \frac{\partial y}{\partial u_1}\frac{\partial u_1}{\partial x_1} + \frac{\partial y}{\partial u_2}\frac{\partial u_2}{\partial x_1}$$

Stare at that for a moment. The right side is a dot product - the row $\bigl[\frac{\partial y}{\partial u_1},\; \frac{\partial y}{\partial u_2}\bigr]$ dotted with the column $\bigl[\frac{\partial u_1}{\partial x_1},\; \frac{\partial u_2}{\partial x_1}\bigr]^T$.

And if we have multiple outputs $y_1, y_2, \ldots, y_m$ each depending on multiple inputs $x_1, x_2, \ldots, x_n$ through intermediates $u_1, u_2, \ldots, u_p$? Then every single entry of the resulting Jacobian is one of these dot products. Row $i$ of $\mathbf{Y}_u$ dotted with column $j$ of $\mathbf{U}_x$ gives us entry $(i,j)$ of the answer. That's the definition of matrix multiplication.

$$\mathbf{Y}_x = \mathbf{Y}_u \cdot \mathbf{U}_x$$

The Jacobian of the composition = product of the Jacobians

Each entry $(i,j)$ of $\mathbf{Y}_x$ sums up all the indirect pathways from input $x_j$ to output $y_i$ through the intermediate variables. The single-variable chain rule multiplies two numbers; the general chain rule multiplies two matrices. Same idea, bigger playground.

Pick a pair of mappings below and watch how the dot products assemble the product matrix entry by entry.

The Jacobian product in action

Enough abstraction - let's compute. Consider the composition:

$$y_1 = u_1^2 + u_2, \quad y_2 = u_1 - u_2^2$$ $$u_1 = 2x_1 + x_2, \quad u_2 = x_1 - x_2$$

We want the Jacobian $\mathbf{Y}_x$ at the point $\mathbf{x} = (1, 0)$. We could substitute everything, expand, and differentiate the mess. But the chain rule says: just compute two simple Jacobians and multiply.

Step 1: The inner Jacobian $\mathbf{U}_x$

The inner mapping $\mathbf{g}$ is linear, so its Jacobian is just the coefficient matrix - the same everywhere:

$$\mathbf{U}_x = \begin{pmatrix} \partial u_1/\partial x_1 & \partial u_1/\partial x_2 \\ \partial u_2/\partial x_1 & \partial u_2/\partial x_2 \end{pmatrix} = \begin{pmatrix} 2 & 1 \\ 1 & -1 \end{pmatrix}$$

Step 2: Find the intermediate point

Before we can compute $\mathbf{Y}_u$, we need to know where in $u$-space we are. At $\mathbf{x} = (1,0)$:

$$u_1 = 2(1) + 0 = 2, \quad u_2 = 1 - 0 = 1 \quad \Longrightarrow \quad \mathbf{u} = (2, 1)$$

Step 3: The outer Jacobian $\mathbf{Y}_u$ at $\mathbf{u} = (2,1)$

$$\mathbf{Y}_u = \begin{pmatrix} 2u_1 & 1 \\ 1 & -2u_2 \end{pmatrix} \;\bigg|_{(2,1)} = \begin{pmatrix} 4 & 1 \\ 1 & -2 \end{pmatrix}$$

Step 4: Multiply

We never had to compose the functions and differentiate the resulting polynomial. We just multiplied two matrices at the right point. For complicated nested mappings with many variables, this modularity is a lifesaver.

Verification - the hard way

Can we trust this? Let's verify by brute force. Substitute $u_1 = 2x_1 + x_2$ and $u_2 = x_1 - x_2$ directly:

$$y_1 = (2x_1+x_2)^2 + (x_1-x_2) = 4x_1^2 + 4x_1 x_2 + x_2^2 + x_1 - x_2$$ $$y_2 = (2x_1+x_2) - (x_1-x_2)^2 = 2x_1 + x_2 - x_1^2 + 2x_1 x_2 - x_2^2$$

Differentiating these and plugging in $(1,0)$:

$$\mathbf{Y}_x\big|_{\text{direct}} = \begin{pmatrix} 8(1)+4(0)+1 & 4(1)+2(0)-1 \\ 2-2(1)+2(0) & 1+2(1)-2(0) \end{pmatrix} = \begin{pmatrix} 9 & 3 \\ 0 & 3 \end{pmatrix} \;\checkmark$$

Same answer, two routes. The direct route required expanding and differentiating polynomials. The chain rule route computed each piece independently and combined them with a single matrix multiply. Imagine doing this with 5 intermediate variables instead of 2 - the matrix route wins by a landslide.

Determinants multiply - area distortion

Here's where things get geometrically beautiful. When the Jacobian matrices are square - same number of inputs, intermediates, and outputs - we can take determinants of both sides of $\mathbf{Y}_x = \mathbf{Y}_u \cdot \mathbf{U}_x$:

$$\det(\mathbf{Y}_x) = \det(\mathbf{Y}_u) \cdot \det(\mathbf{U}_x)$$

That's just the standard linear algebra fact $\det(AB) = \det(A)\det(B)$. But in the language of Jacobians, it says something vivid:

$$\frac{\partial(y_1, \ldots, y_n)}{\partial(x_1, \ldots, x_n)} = \frac{\partial(y_1, \ldots, y_n)}{\partial(u_1, \ldots, u_n)} \cdot \frac{\partial(u_1, \ldots, u_n)}{\partial(x_1, \ldots, x_n)}$$

What does this mean? Each Jacobian determinant measures how much the mapping locally stretches or compresses area (in 2D) or volume (in 3D). So the rule says:

Area distortion factors compose by multiplication.

Think of it like currency exchange rates. If 1 dollar buys 2 euros and 1 euro buys 3 yen, then 1 dollar buys $2 \times 3 = 6$ yen. Same logic: if $\mathbf{g}$ doubles local area and $\mathbf{f}$ triples it, the composite $\mathbf{f} \circ \mathbf{g}$ multiplies area by 6.

This fact becomes the engine behind change of variables in multiple integrals (Chapter 4). When you switch from Cartesian to polar coordinates, the area element picks up that factor of $r$ - that's exactly one of these Jacobian determinants at work.

Try different mapping pairs below. Watch the colored square deform through each stage, and check that the area ratios multiply.

$x$-space

$u$-space

$y$-space

The differential form

There's a lovely way to see the chain rule that makes it feel almost inevitable. The Jacobian matrix $\mathbf{Y}_u$ tells us how small changes in $\mathbf{u}$ produce small changes in $\mathbf{y}$:

$$d\mathbf{y} = \mathbf{Y}_u \, d\mathbf{u}$$

And the inner mapping relates $d\mathbf{u}$ to $d\mathbf{x}$:

$$d\mathbf{u} = \mathbf{U}_x \, d\mathbf{x}$$

Now do what any calculus student would do - substitute. Replace $d\mathbf{u}$:

$$d\mathbf{y} = \mathbf{Y}_u \, \mathbf{U}_x \, d\mathbf{x}$$

That's it. The chain rule is substitution. In single-variable calculus we write $dy = f'(u)\,du = f'(g(x))\,g'(x)\,dx$, replacing $du$ with $g'(x)\,dx$. The matrix version does exactly the same thing, just with matrices instead of numbers.

Linear approximations compose

This leads to the deepest way to understand what's happening. The matrix $\mathbf{Y}_u$ is the best linear approximation to $\mathbf{f}$ near a point. The matrix $\mathbf{U}_x$ is the best linear approximation to $\mathbf{g}$ near a point. Their product?

The linear approximation of a composition is the composition of the linear approximations. The operation "zoom in until the map looks linear" commutes with the operation "compose two maps." That's a genuinely beautiful fact.

When $\mathbf{f}$ and $\mathbf{g}$ are actually linear - say $\mathbf{y} = A\mathbf{u}$ and $\mathbf{u} = B\mathbf{x}$ - the composite is $\mathbf{y} = AB\mathbf{x}$, and the chain rule reduces to ordinary matrix multiplication with no approximation at all. The general chain rule says: even when the maps are nonlinear, the same holds for their linear approximations at each point.

Trap: order of multiplication matters! $\mathbf{Y}_u \mathbf{U}_x \neq \mathbf{U}_x \mathbf{Y}_u$ in general. The outer function's Jacobian always goes on the left. Think of it as reading the composition from outside in: $\mathbf{f} \circ \mathbf{g}$ means "apply $\mathbf{g}$ first, then $\mathbf{f}$" - but $\mathbf{f}$'s Jacobian sits on the left, just as $f'$ precedes $g'$ in $f'(g(x)) \cdot g'(x)$.

Chains of any length

The same logic extends effortlessly. If $\mathbf{y} = \mathbf{f}(\mathbf{u})$, $\mathbf{u} = \mathbf{g}(\mathbf{v})$, $\mathbf{v} = \mathbf{h}(\mathbf{x})$, then we can apply the chain rule twice:

$$\mathbf{y}_x = \mathbf{y}_u \, \mathbf{u}_v \, \mathbf{v}_x$$

Just keep multiplying Jacobian matrices, one for each link in the chain, outer to inner. Each factor is the derivative of that stage, evaluated at the point it actually receives as input. We'll prove this cleanly in Problem 3(a) below.

Practice Problems - §2.9

From Kaplan, problems after §2.9

1(a) Find the Jacobian matrix $(\partial y_i / \partial x_j)$ as a product of two matrices and evaluate

$y_1 = u_1 u_2 - 3u_1$, $y_2 = u_1^2 + 2u_1 u_2 + 2u_1 - u_2$; $u_1 = x_1 \cos 3x_2$, $u_2 = x_1 \sin 3x_2$.

Find the Jacobian matrix in the form of a product of two matrices and evaluate for $x_1 = 0,\; x_2 = 0$.

Step 1: Compute $\mathbf{Y}_u$.

Differentiate $(y_1, y_2)$ with respect to $(u_1, u_2)$:

$$\mathbf{Y}_u = \begin{pmatrix} u_2 - 3 & u_1 \\ 2u_1 + 2u_2 + 2 & 2u_1 - 1 \end{pmatrix}$$

Step 2: Compute $\mathbf{U}_x$.

Differentiate $(u_1, u_2)$ with respect to $(x_1, x_2)$:

$$\mathbf{U}_x = \begin{pmatrix} \cos(3x_2) & -3x_1\sin(3x_2) \\ \sin(3x_2) & 3x_1\cos(3x_2) \end{pmatrix}$$

Step 3: Evaluate at $x_1 = 0, x_2 = 0$.

First find the intermediate point: $u_1 = 0 \cdot \cos(0) = 0$, $u_2 = 0 \cdot \sin(0) = 0$.

$$\mathbf{Y}_u\big|_{(0,0)} = \begin{pmatrix} -3 & 0 \\ 2 & -1 \end{pmatrix}, \quad \mathbf{U}_x\big|_{(0,0)} = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}$$

Step 4: Multiply. $$\mathbf{Y}_x = \begin{pmatrix} -3 & 0 \\ 2 & -1 \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix} = \begin{pmatrix} -3 & 0 \\ 2 & 0 \end{pmatrix}$$

Step 5: Interpret.

The entire second column is zero. Why? At $x_1 = 0$, changing $x_2$ has no effect on either $u_1$ or $u_2$ - both are $x_1$ times a trig function, so when $x_1 = 0$, the trig part is irrelevant. The $\mathbf{U}_x$ matrix already told us this with its zero column, and that zero propagated through the multiplication. The Jacobian determinant is $0$: the mapping crushes a 2D region into a line at this point.

4 Find a Jacobian from partial derivative data alone

Given: $f(x_0,y_0) = u_0$, $g(x_0,y_0) = v_0$. At these points:

$f_x = 2,\; f_y = 3,\; g_x = -1,\; g_y = 5$

$p_u = 7,\; p_v = 1,\; q_u = -3,\; q_v = 2$

Let $z = p(f(x,y),\, g(x,y))$ and $w = q(f(x,y),\, g(x,y))$. Find the Jacobian matrix of $(z, w)$ with respect to $(x, y)$ at $(x_0, y_0)$.

Step 1: Recognize the structure.

This is a pure chain rule problem. We don't need explicit formulas for $f, g, p, q$ - just their derivatives at the right points. The outer mapping $(u,v) \mapsto (z,w)$ composes with the inner mapping $(x,y) \mapsto (u,v)$, and the general chain rule says their Jacobians multiply.

Step 2: Write down the two Jacobians. $$\text{Outer: } \begin{pmatrix} p_u & p_v \\ q_u & q_v \end{pmatrix} = \begin{pmatrix} 7 & 1 \\ -3 & 2 \end{pmatrix}, \qquad \text{Inner: } \begin{pmatrix} f_x & f_y \\ g_x & g_y \end{pmatrix} = \begin{pmatrix} 2 & 3 \\ -1 & 5 \end{pmatrix}$$

Step 3: Multiply (outer on the left). $$\begin{pmatrix} 7 & 1 \\ -3 & 2 \end{pmatrix}\begin{pmatrix} 2 & 3 \\ -1 & 5 \end{pmatrix} = \begin{pmatrix} 7(2)+1(-1) & 7(3)+1(5) \\ -3(2)+2(-1) & -3(3)+2(5) \end{pmatrix} = \begin{pmatrix} 13 & 26 \\ -8 & 1 \end{pmatrix}$$

Step 4: Reflect.

We computed the full $2 \times 2$ Jacobian of the composite without ever knowing an explicit formula for any of the four functions. Just the partial derivatives at one point, plus the chain rule. This is the power of the general chain rule: it reduces calculus to linear algebra.

3(a) Justify the triple composition chain rule

Let $\mathbf{y} = \mathbf{f}(\mathbf{u})$, $\mathbf{u} = \mathbf{g}(\mathbf{v})$, $\mathbf{v} = \mathbf{h}(\mathbf{x})$. Show that $\mathbf{y}_x = \mathbf{y}_u \, \mathbf{u}_v \, \mathbf{v}_x$.

Step 1: Apply the chain rule to the inner pair.

The composition $\mathbf{u} = \mathbf{g}(\mathbf{h}(\mathbf{x}))$ has Jacobian:

$$\mathbf{u}_x = \mathbf{u}_v \cdot \mathbf{v}_x$$

Step 2: Apply the chain rule to the outer pair.

Now $\mathbf{y} = \mathbf{f}(\mathbf{u})$ where $\mathbf{u}$ depends on $\mathbf{x}$ (through $\mathbf{v}$). The chain rule gives:

$$\mathbf{y}_x = \mathbf{y}_u \cdot \mathbf{u}_x = \mathbf{y}_u \cdot (\mathbf{u}_v \cdot \mathbf{v}_x) = \mathbf{y}_u \, \mathbf{u}_v \, \mathbf{v}_x$$

The last equality uses associativity of matrix multiplication.

Step 3: The pattern.

The chain rule extends to any number of stages. With $k$ links in the chain, $\mathbf{y}_x$ is a product of $k$ Jacobian matrices, always written outer to inner, each evaluated at the point its mapping actually receives as input. The same logic (apply the two-stage rule and substitute) works for 4, 5, or 100 stages.