There are a number of great techniques to solve systems of linear equations where the number of variables are equal to the number of equations (non-singular, square matrices). A few are: Gaussian elimination (direct method), Jacobi, Gauss-Seidel (iterative methods). For more details on these, read this.

But often we come across problems where we have to approximate the solution, and minimize the difference between our approximation and the true solution (the “residual”). This blog post focuses on overdetermined systems where $A \in \Re^{m × n}$ and $m>n$. These are commonly known as least square problems, and we look at a technique to solve them using normal equations and QR decomposition.

#### Understanding the problem

Generally, it is impossible to find an exact solution to the overdetermined system $Ax = b$, so we want to minimize the L2 norm of $b-Ax$. That is, we want to find $\underset{x \in \mathbb{R(A)}}{\operatorname{argmin}} \|Ax-b\|$, where $\mathbb{R(A)}$ is the range of $A$.
Let $\hat{x}$ be the required minimizer and consider the following image: Obviously, $b$ is not in $\mathbb{R(A)}$, but note that $\hat{r}$ (the residual) is orthogonal to $\mathbb{R(A)}$, and $A\hat{x}$ is the projection of $b$ onto $\mathbb{R(A)}$. This makes sense because we know from elementary geometry that the closest point to $b$ on the plane $\mathbb{R(A)}$ will be on the perpendicular projection of the former on the latter.

#### Deriving the normal equation

Let’s derive the normal equation using calculus. We want to minimize the function $f(x) = \|Ax-b\|^2$, which can be written as $\sum_{i=1}^m(\sum_{j=1}^nA_{i,j}x_j - b_i)^2$ and $\frac{\partial f}{\partial x_k} = 2\sum_{i=1}^m A_{i,k}(\sum_{j=1}^nA_{i,j}x_j - b_i)$. Now we have $\nabla f(x) = 2A^T(Ax-b)$. We want to minimize $f(x)$, so we set $\nabla f(x) = 0$.

Then we have $A^TAx = A^TAb$ (the normal equation!), and if A has linearly independent columns, $\hat{x} = (A^TA)^{-1}A^Tb$ is a unique solution to the normal equation.

#### QR Decomposition

QR decomposition is a way of representing $A \in \Re^{m × n}$ as $A = QR$ where $Q$ is an orthogonal matrix and $R$ is a square, upper triangular, non-singular matrix. A very important property of this decomposition is that $\mathbb{R(A)} = \mathbb{R(Q)}$, which means that the columns of $Q$ form an orthonormal basis of $A$.

Rewriting $\hat{x} = (A^TA)^{-1}A^Tb$, substituting $A$ with $QR$ we get $\hat{x} = R^{-1}Q^Tb$.

Summary
QR Factorization is very neat, not only does it help solve least square problems, but is also fundamental to the magical eigenvalue finding algorithm. Least square problems are very common in multiple fields. Currently, I’m dealing with them in my convex optimization class (spoilers?).