Proof of Normal Equations
We will assume that ATA is invertible. This
is a reasonable assumption because there are more equations than unknowns. The
length of the vector x is defined to be
so the length of the
vector squared is
The square of the
length of the vector Ax-b is the sum of the squares of the errors
since each element of Ax - b is the error at a point. We want to
minimize this quantity. We claim that the solution to the normal equations,
which we will denote as x* =
(ATA)-1ATb,
minimizes the sum of the squares of the errors.
We claim, and will prove, that
Since we are minimizing over x, ||Ax - b||2 will be minimized when x = x* so that ||A (x - x*)||2 is zero. This is true because ||A (x - x*)||2 can never be negative.
To prove the claim, we will work with the right side of the equation and
prove that it is equal to the left side. These equations are numbered. There are
short explanations of these steps following the proof.
| = | ||Ax* - b||2 + ||A (x - x*)||2 | (1) | |
| = | (Ax* - b)T (Ax* - b) + (A(x - x*))T A(x - x*) | (2) | |
| = | (A (ATA)-1ATb - b)T (A (ATA)-1ATb - b)T + | ||
| (A(x - (ATA)-1ATb))T A(x - (ATA)-1ATb) | (3) | ||
| = | (A (ATA)-1 ATb - b)T (A (ATA)-1 ATb - b) + | ||
| (Ax - A (ATA)-1 ATb)T (Ax - A (ATA)-1 ATb) | (4) | ||
| = | (bTA (ATA)-1 AT - bT) (A (ATA)-1 ATb - b) + | ||
| (xTAT - bTA (ATA)-1 AT)(Ax - A (ATA)-1 ATb) | (5) | ||
| = | (bTA (ATA)-1 ATA (ATA)-1ATb - 2bTA (ATA)-1ATb + bTb + | ||
| (xTATAx - 2bTA (ATA)-1 ATAx + bTA (ATA)-1 ATA (ATA)-1ATb) | (6) | ||
| = | (bTA (ATA)-1 ATb - 2bTA (ATA)-1 ATb + bTb) + | ||
| (xTATAx - 2bTAx + bTA (ATA)-1 ATb) | (7) | ||
| = | bTb + xTATAx - 2bTAx | (8) |
Now, if we look at the left side of the equation, we can see that it equals
the right side because
This proves that x = x* = (ATA)-1ATb uniquely minimizes the sum of the squares of the error for the equation Ax - b.
The following are comments on the steps of the proof:
| |