There are a number of great techniques to solve systems of linear equations where the number of variables are equal to the number of equations (non-singular, square matrices). A few are: Gaussian elimination (direct method), Jacobi, Gauss-Seidel (iterative methods). For more details on these, read this.

But often we come across problems where we have to approximate the solution, and minimize the difference between our approximation and the true solution (the “residual”). This blog post focuses on overdetermined systems where and . These are commonly known as least square problems, and we look at a technique to solve them using normal equations and QR decomposition.

Understanding the problem

Generally, it is impossible to find an exact solution to the overdetermined system , so we want to minimize the L2 norm of . That is, we want to find , where is the range of .
Let be the required minimizer and consider the following image:

Obviously, is not in , but note that (the residual) is orthogonal to , and is the projection of onto . This makes sense because we know from elementary geometry that the closest point to on the plane will be on the perpendicular projection of the former on the latter.

Deriving the normal equation

Let’s derive the normal equation using calculus. We want to minimize the function , which can be written as and . Now we have . We want to minimize , so we set .

Then we have (the normal equation!), and if A has linearly independent columns, is a unique solution to the normal equation.

QR Decomposition

QR decomposition is a way of representing as where is an orthogonal matrix and is a square, upper triangular, non-singular matrix. A very important property of this decomposition is that , which means that the columns of form an orthonormal basis of .

Rewriting , substituting with we get .

Summary
QR Factorization is very neat, not only does it help solve least square problems, but is also fundamental to the magical eigenvalue finding algorithm. Least square problems are very common in multiple fields. Currently, I’m dealing with them in my convex optimization class (spoilers?).

References
MATH 170A at UC San Diego
CS 3220 at Cornell
ECE 133A at UCLA