Why This Matters
In single-variable calculus, the chain rule says that the derivative of f(g(x)) is f-prime(g(x)) times g-prime(x). But what happens when f and g each take multiple inputs and produce multiple outputs? The multivariable chain rule answers this question, and the answer involves matrix multiplication.
The Jacobian matrix is the multivariable generalization of the derivative. It collects all partial derivatives of a vector-valued function into a single matrix. Composing two functions means multiplying their Jacobians, just like the single-variable chain rule multiplies derivatives. The total derivative captures all the ways that intermediate variables carry changes through a computation. This machinery is exactly what automatic differentiation frameworks like PyTorch and TensorFlow use to compute gradients of deep neural networks through hundreds of composed layers.
Define Terms
Visual Model
The full process at a glance. Click Start tour to walk through each step.
The multivariable chain rule multiplies the gradient of f by the Jacobian of the intermediate mapping. Each total derivative sums contributions from all paths.
Code Example
// Multivariable chain rule: numerical verification
// f(x, y) = x^2 * y, where x(t, s) = t + s, y(t, s) = t * s
function f(x, y) { return x * x * y; }
function xFn(t, s) { return t + s; }
function yFn(t, s) { return t * s; }
// Composed function: g(t, s) = f(x(t,s), y(t,s))
function g(t, s) {
return f(xFn(t, s), yFn(t, s));
}
// Numerical partial derivatives
function partials(fn, a, b, h = 1e-6) {
const da = (fn(a + h, b) - fn(a - h, b)) / (2 * h);
const db = (fn(a, b + h) - fn(a, b - h)) / (2 * h);
return [da, db];
}
// Compute via chain rule at t=2, s=3
const t = 2, s = 3;
const x = xFn(t, s); // 5
const y = yFn(t, s); // 6
// Gradient of f at (x, y)
const [dfdx, dfdy] = partials(f, x, y);
console.log("df/dx:", dfdx.toFixed(4), "df/dy:", dfdy.toFixed(4)); // 60, 25
// Jacobian of (x, y) w.r.t. (t, s)
const [dxdt, dxds] = partials(xFn, t, s); // 1, 1
const [dydt, dyds] = partials(yFn, t, s); // 3, 2
console.log("Jacobian row 1:", dxdt.toFixed(2), dxds.toFixed(2));
console.log("Jacobian row 2:", dydt.toFixed(2), dyds.toFixed(2));
// Chain rule: dg/dt = df/dx * dx/dt + df/dy * dy/dt
const dgdt = dfdx * dxdt + dfdy * dydt;
const dgds = dfdx * dxds + dfdy * dyds;
console.log("dg/dt (chain rule):", dgdt.toFixed(4)); // 60*1 + 25*3 = 135
console.log("dg/ds (chain rule):", dgds.toFixed(4)); // 60*1 + 25*2 = 110
// Verify by direct numerical differentiation of g
const [dgdt_direct, dgds_direct] = partials(g, t, s);
console.log("dg/dt (direct):", dgdt_direct.toFixed(4));
console.log("dg/ds (direct):", dgds_direct.toFixed(4));Interactive Experiment
Try these exercises:
- Change the inner functions to x(t,s) = t^2 and y(t,s) = s^2. Verify the chain rule still holds by comparing the chain-rule result to direct differentiation.
- Add a third intermediate variable z(t,s) = t - s. Now f(x,y,z) = x*y + z. What shape is the Jacobian?
- Try computing the Jacobian of a function that maps R^2 to R^3, for example (t,s) -> (tcos(s), tsin(s), t). What is the shape of the Jacobian matrix?
- Build a two-layer composition: h(t) = f(g(t)) where g: R -> R^2 and f: R^2 -> R. Verify that dh/dt equals the dot product of grad(f) and g-prime(t).
- Experiment with the step size h in the central difference. At what point do rounding errors start dominating?
Quick Quiz
Coding Challenge
Write a function called `chainRuleDerivs` that takes t and s as inputs. Given x(t,s) = t^2 + s, y(t,s) = t*s^2, and f(x,y) = x*y, compute df/dt and df/ds using the chain rule (not direct differentiation). Use numerical central differences with h = 1e-5. Return a string `dfdt,dfds` rounded to 4 decimal places.
Real-World Usage
The multivariable chain rule and Jacobians appear throughout science and engineering:
- Deep learning: Backpropagation is the chain rule applied layer by layer. Each layer computes its local Jacobian, and gradients propagate backward through the entire network.
- Robotics: Forward kinematics maps joint angles to end-effector position. The Jacobian of this mapping relates joint velocities to tip velocities, enabling inverse kinematics.
- Coordinate transforms: Changing from Cartesian to polar coordinates requires the Jacobian to transform integrals and differential operators correctly.
- Control theory: Linearizing a nonlinear system around an operating point uses the Jacobian to approximate the system dynamics.
- Computer graphics: Texture mapping and deformations use Jacobians to correctly transform normals, tangents, and area elements.