# Iterative Methods: Part 3

Let’s look at one more way to solve the equation $A\vec{x} = \vec{b}$. We assume that $A \in \mathbb{R}^{n \times n}$ is nonsingular, and define the $k$-th Krylov subspace as follows:

$\mathscr{K}_{k} \,:=\, \mbox{span}\{\vec{b},\,A\vec{b},\,\cdots,\,A^{k - 1}\vec{b}\}$.

Krylov subspace methods are efficient and popular iterative methods for solving large, sparse linear systems. When $A$ is symmetric, positive definite (SPD), i.e.

$\left\{\begin{array}{l} A^{T} = A \\[12pt] \vec{x}^{T}A\vec{x} > 0,\,\,\,\forall\,\vec{x} \neq \vec{0}, \end{array}\right.$

we can use a Krylov subspace method called Conjugate Gradient (CG).

Today, let’s find out how CG works and use it to solve 2D Poisson’s equation.

## a. Creating iterates

Let $(\cdot,\,\cdot)$ be an inner product for $\mathbb{R}^{n}$, and $||\cdot|| := \sqrt{(\cdot,\,\cdot)}$ the norm associated with the inner product.

How Krylov subspace methods work is simple. We will approximate the exact solution $\vec{x} = A^{-1}\vec{b}$ with a sequence of iterates $\vec{x}_{k}$ that meet two conditions:

$\begin{array}{l} \mbox{1.}\,\,\,\vec{x}_{k} \in \mathscr{K}_{k} \\[12pt] \mbox{2.}\,\,\,||\vec{x}_{k} - \vec{x}|| = \min\limits_{\vec{y}\,\in\,\mathscr{K}_{k}}\,||\vec{y} - \vec{x}||.\end{array}$

In other words, when measured in the norm $||\cdot||$, $\vec{x}_{k}$ is the best approximation to $\vec{x}$ from the $k$-th Krylov subspace.

(For initialization, we will take $\vec{x}_{0} = \vec{0}$ and $\mathscr{K}_{0} = \{\vec{0}\}$.)

The optimality condition above is the same as the orthogonality condition $\vec{x}_{k} - \vec{x} \perp \mathscr{K}_{k}$ under the inner product $(\cdot,\,\cdot)$. Therefore, $\vec{x}_{k}$ satisfies the equation,

$(\vec{x}_{k} - \vec{x},\,\vec{y}) = 0,\,\,\,\forall\,\vec{y} \in \mathscr{K}_{k}$.

### Lemma 1.

The orthogonality condition and the fact that $\mathscr{K}_{k - 1} \subset \mathscr{K}_{k}$ imply two things. At every iteration $k \geq 1$,

$\begin{array}{l} \mbox{1.}\,\,\,\vec{x}_{k} - \vec{x}_{k - 1} \in \mathscr{K}_{k} \\[12pt] \mbox{2.}\,\,\,\vec{x}_{k} - \vec{x}_{k - 1} \perp \mathscr{K}_{k - 1}.\end{array}$

### Proof.

First, note that the vector

$\vec{x}_{k} - \vec{x}_{k - 1} \,\equiv\, (\vec{x}_{k} - \vec{x}) - (\vec{x}_{k - 1} - \vec{x})$

represents the decrease in error over 1 iteration.

Since $\vec{x}_{k} \in \mathscr{K}_{k}$ and $\vec{x}_{k - 1} \in \mathscr{K}_{k - 1} \subset \mathscr{K}_{k}$, we know that $\vec{x}_{k} - \vec{x}_{k - 1} \in \mathscr{K}_{k}$.

Furthermore, for any $\vec{y} \in \mathscr{K}_{k - 1}\,(\subset \mathscr{K}_{k})$,

$(\vec{x}_{k} - \vec{x}_{k - 1},\,\vec{y}) \,=\, \underbrace{(\vec{x}_{k} - \vec{x},\,\vec{y})}_{=\,0} \,-\, \underbrace{(\vec{x}_{k - 1} - \vec{x},\,\vec{y})}_{=\,0} \,=\, 0.$

## b. Creating residuals

Next, we select a vector $\vec{p}_{k} \neq \vec{0}$ that meets these conditions:

$\begin{array}{l} \mbox{1.}\,\,\,\vec{p}_{k} \in \mathscr{K}_{k} \\[12pt] \mbox{2.}\,\,\,\vec{p}_{k} \perp \mathscr{K}_{k - 1}.\end{array}$

(For initialization, we will take $\vec{p}_{1} = \vec{b}$.)

The vector $\vec{x}_{k} - \vec{x}_{k - 1}$ also satisfied these conditions, so the two are linearly dependent. There is some constant $\alpha_{k} \in \mathbb{R}\,\backslash\,\{0\}$ such that $\vec{x}_{k} - \vec{x}_{k - 1} = \alpha_{k}\vec{p}_{k}$.

We now have a way to compute the best approximation $\vec{x}_{k}$ from the previous one:

$\boxed{\vec{x}_{k} = \vec{x}_{k - 1} + \alpha_{k}\vec{p}_{k},\,\,\,\forall\,k \geq 1}$.

This holds, provided we have a search direction $\vec{p}_{k}$.

### Lemma 2.

The coefficient $\alpha_{k}$ is given by,

$\boxed{\displaystyle\alpha_{k} = -\frac{(\vec{x}_{k - 1} - \vec{x},\,\vec{p}_{k})}{(\vec{p}_{k},\,\vec{p}_{k})} = \frac{(A^{-1}\vec{r}_{k - 1},\,\vec{p}_{k})}{(\vec{p}_{k},\,\vec{p}_{k})},\,\,\,\forall\,k \geq 1}$,

where $\vec{r}_{k} := \vec{b} - A\vec{x}_{k}$ is the residual.

### Proof.

We have the equation,

$\vec{x}_{k} - \vec{x} = (\vec{x}_{k - 1} - \vec{x}) + \alpha_{k}\vec{p}_{k}$.

On each side, take the inner product with $\vec{p}_{k}$:

$(\vec{x}_{k} - \vec{x},\,\vec{p}_{k}) = (\vec{x}_{k - 1} - \vec{x},\,\vec{p}_{k}) + \alpha_{k}(\vec{p}_{k},\,\vec{p}_{k})$.

Since $\vec{x}_{k} - \vec{x} \perp \mathscr{K}_{k}$ and $\vec{p}_{k} \in \mathscr{K}_{k}$, the LHS is equal to 0. Hence,

$\displaystyle\alpha_{k} = -\frac{(\vec{x}_{k - 1} - \vec{x},\,\vec{p}_{k})}{(\vec{p}_{k},\,\vec{p}_{k})}$.

Note, $\vec{r}_{k - 1} = A(\vec{x} - \vec{x}_{k - 1})$.

### Lemma 3.

The residuals are related in the following manner:

$\boxed{\vec{r}_{k} = \vec{r}_{k - 1} - \alpha_{k}A\vec{p}_{k},\,\,\,\forall\,k \geq 1}$.

### Proof.

Apply $A$ to both sides of the equation:

$(\vec{x} - \vec{x}_{k}) = (\vec{x} - \vec{x}_{k - 1}) - \alpha_{k}\vec{p}_{k}$.

## c. Creating search directions

We will now construct the search direction $\vec{p}_{k + 1}$ so that we can continue the iteration. Along the way, we required the search directions to satisfy,

$\vec{p}_{i} \in \mathscr{K}_{i} \,\backslash\, \{\vec{0}\} \,\,\,\mbox{and}\,\,\,\vec{p}_{i} \perp \mathscr{K}_{i - 1},\,\,\,\forall\,i = 1,\,\cdots,\,k$.

Thus, if $\vec{p}_{k + 1}$ also meets these conditions, namely,

$\vec{p}_{k + 1} \in \mathscr{K}_{k + 1} \,\backslash\, \{\vec{0}\} \,\,\,\mbox{and}\,\,\,\vec{p}_{k + 1} \perp \mathscr{K}_{k}$,

then $\{\vec{p}_{1},\,\cdots,\,\vec{p}_{k + 1}\}$ will end up forming an orthogonal basis for $\mathscr{K}_{k + 1}$.

We can create such a sequence using Gram-Schmidt method! Specifically, $\vec{p}_{k + 1}$ will be the sum of a vector in $\mathscr{K}_{k}$ (which can be written as a linear combination of $\vec{p}_{1},\,\cdots,\,\vec{p}_{k}$) and an offshoot $\vec{w}_{k} \in \mathscr{K}_{k + 1} \,\backslash\, \mathscr{K}_{k}$:

$\boxed{\displaystyle\vec{p}_{k + 1} = \vec{w}_{k} + \sum_{i\,=\,1}^{k}\,\gamma_{i}\vec{p}_{i},\,\,\,\forall\,k \geq 1}$.

Then, we orthogonalize $\vec{p}_{k + 1}$ against $\vec{p}_{1},\,\cdots,\,\vec{p}_{k}$.

### Lemma 4.

In the $k$-th iteration, the coefficients $\gamma_{i}$ are given by,

$\boxed{\displaystyle\gamma_{i} = -\frac{(\vec{w}_{k},\,\vec{p}_{i})}{(\vec{p}_{i},\,\vec{p}_{i})},\,\,\,\forall\,i = 1,\,\cdots,\,k}$.

### Proof.

Given the equation,

$\displaystyle\vec{p}_{k + 1} = \vec{w}_{k} + \sum_{j\,=\,1}^{k}\,\gamma_{j}\vec{p}_{j}$,

we can take the inner product with $\vec{p}_{i}$:

$\displaystyle(\vec{p}_{k + 1},\,\vec{p}_{i}) = (\vec{w}_{k},\,\vec{p}_{i}) + \sum_{j\,=\,1}^{k}\,\gamma_{j}(\vec{p}_{j},\,\vec{p}_{i})$.

Since $\vec{p}_{k + 1} \perp \mathscr{K}_{i}$ and $\vec{p}_{i} \in \mathscr{K}_{i}$, the LHS is equal to 0. Furthermore, $\{\vec{p}_{1},\,\cdots,\,\vec{p}_{k}\}$ forms an orthogonal basis, so the summation on the RHS reduces to a single term:

$\displaystyle\sum_{j\,=\,1}^{k}\,\gamma_{j}(\vec{p}_{j},\,\vec{p}_{i}) = \gamma_{i}(\vec{p}_{i},\,\vec{p}_{i})$.

We can now solve for $\gamma_{i}$:

$\displaystyle\gamma_{i} = -\frac{(\vec{w}_{k},\,\vec{p}_{i})}{(\vec{p}_{i},\,\vec{p}_{i})}$.

Isn’t it cool to work backwards and see the motivations? We just need to figure out how to choose $\vec{w}_{k} \in \mathscr{K}_{k + 1} \,\backslash\, \mathscr{K}_{k}$.

## d. SPD

So far, we have not used the fact that $A$ is SPD. We do so now to obtain CG.

### Lemma 5.

If $A \in \mathbb{R}^{n \times n}$ is SPD, then

$(\vec{x},\,\vec{y})_{A} := \vec{x}^{T}A\vec{y}$

defines an inner product for $\mathbb{R}^{n}$. Then, $||\vec{x}||_{A} := \sqrt{(\vec{x},\,\vec{x})_{A}}$ gives us a norm, called the $A$-norm.

### Proof.

Let’s show that, for all vectors $\vec{x},\,\vec{y},\,\vec{z} \in \mathbb{R}^{n}$ and scalars $\alpha,\,\beta \in \mathbb{R}$, the $A$-inner product is symmetric, bilinear, and positive-definite:

#### Symmetric:

\begin{aligned} (\vec{x},\,\vec{y})_{A} &= \vec{x}^{T}A\vec{y} \\[12pt] &= \vec{x}^{T}A^{T}\vec{y} \\[12pt] &= (A\vec{x})^{T}\vec{y} \\[12pt] &= \vec{y}^{T}(A\vec{x}) \\[12pt] &= (\vec{y},\,\vec{x})_{A}. \end{aligned}

Notice how we used symmetry of the Euclidean inner product.

#### Bilinear:

\begin{aligned} (\alpha\vec{x} + \beta\vec{y},\,\vec{z})_{A} &= (\alpha\vec{x} + \beta\vec{y})^{T}A\vec{z} \\[12pt] &= \alpha(\vec{x}^{T}A\vec{z}) + \beta(\vec{y}^{T}A\vec{z}) \\[12pt] &= \alpha(\vec{x},\,\vec{z})_{A} + \beta(\vec{y},\,\vec{z})_{A}. \end{aligned}

Since the $A$-inner product is symmetric, we also have linearity in the second argument.

#### Positive-definite:

Because $A$ is positive definite,

$(\vec{x},\,\vec{x})_{A} = \vec{x}^{T}A\vec{x} > 0,\,\,\,\forall\,\vec{x} \neq \vec{0}$.

The $A$-inner product is equal to zero only when $\vec{x} = \vec{0}$.

The key thing to realize is that Sections 4a – 4c are valid for any nonsingular matrix $A$. We can create a Krylov subspace method even if $A$ is not positive definite, or even symmetric. We just need to define an inner product and select $\vec{w}_{k}$ appropriately.

Let’s summarize what we learned so far using the $A$-inner product.

We want to arrive at the solution $\vec{x}$ by a sequence of best approximations $\vec{x}_{k}$. Best approximations satisfy the orthogonality condition:

$(\vec{x}_{k} - \vec{x},\,\vec{y})_{A} = 0,\,\,\,\forall\,\vec{y} \in \mathscr{K}_{k}$.

By sheer luck (or well-planning), the residual $\vec{r}_{k}$ is orthogonal to the Krylov subspace $\mathscr{K}_{k}$ in the Euclidean inner product:

$(\vec{r}_{k},\,\vec{y})_{E} = (\vec{x} - \vec{x}_{k},\,\vec{y})_{A} = 0,\,\,\,\forall\,\vec{y} \in \mathscr{K}_{k}$.

In particular, this means the residuals are orthogonal to each other. Hence the name Conjugate Gradient (“orthogonal residuals” didn’t sound good).

Given a best approximation $\vec{x}_{k - 1} \in \mathscr{K}_{k - 1}$, we can find the next one $\vec{x}_{k} \in \mathscr{K}_{k}$:

$\vec{x}_{k} = \vec{x}_{k - 1} + \alpha_{k}\vec{p}_{k}$,

where,

$\displaystyle\alpha_{k} = \frac{(A^{-1}\vec{r}_{k - 1},\,\vec{p}_{k})_{A}}{(\vec{p}_{k},\,\vec{p}_{k})_{A}} = \frac{\vec{r}_{k - 1}^{T}\vec{p}_{k}}{\vec{p}_{k}^{T}A\vec{p}_{k}}$.

We can also find the residual:

$\vec{r}_{k} = \vec{r}_{k - 1} - \alpha_{k}A\vec{p}_{k}$.

Notice that $A\vec{p}_{k}$ appeared again. It’d be nice to implement a sparse matrix-vector multiplication and store this result.

To continue the iteration, we decide on a search direction for some $\vec{w}_{k}$:

$\displaystyle\vec{p}_{k + 1} = \vec{w}_{k} + \sum_{i\,=\,1}^{k}\,\gamma_{i}\vec{p}_{i}$,

where,

$\displaystyle\gamma_{i} = -\frac{(\vec{w}_{k},\,\vec{p}_{i})_{A}}{(\vec{p}_{i},\,\vec{p}_{i})_{A}} = -\frac{\vec{w}_{k}^{T}A\vec{p}_{i}}{\vec{p}_{i}^{T}A\vec{p}_{i}}$.

## e. Creating offshoots

There are a few candidates for $\vec{w}_{k} \in \mathscr{K}_{k + 1}\,\backslash\,\mathscr{K}_{k}$. We will select the residual,

$\boxed{\vec{w}_{k} = \vec{r}_{k}}$,

so that all but one of the coefficients $\gamma_{i}$ will be 0!

### Lemma 6.

When $\vec{w}_{k} = \vec{r}_{k}$,

$(\vec{w}_{k},\,\vec{p}_{i})_{A} = 0,\,\,\,\forall\,i = 1,\,\cdots,\,(k - 1)$.

### Proof.

First, let’s check that $\vec{r}_{k} \in \mathscr{K}_{k + 1} \,\backslash\, \mathscr{K}_{k}$.

We know that the residual $\vec{r}_{k}$ resides in $\mathscr{K}_{k + 1}$, because $\vec{b} \in \mathscr{K}_{k + 1}$ (by definition of the Krylov subspace) and $\vec{x}_{k} \in \mathscr{K}_{k}$ implies $A\vec{x}_{k} \in \mathscr{K}_{k + 1}$.

Moreover, $\vec{r}_{k}$ is orthogonal to $\mathscr{K}_{k}$ in the Euclidean inner product, so $\vec{r}_{k} \not\in \mathscr{K}_{k}$. By the same token,

$(\vec{r}_{k},\,\vec{p}_{i})_{A} = (\vec{r}_{k},\,A\vec{p}_{i})_{E} = 0,\,\,\,\forall\,i = 1,\,\cdots,\,(k - 1).$

### Lemma 7.

When $\vec{w}_{k} = \vec{r}_{k}$, we can simplify the formulas for $\alpha_{k}$, $\gamma_{k}$, and $\vec{p}_{k + 1}$:

$\begin{array}{c} \displaystyle\alpha_{k} = \frac{\vec{r}_{k - 1}^{T}\vec{r}_{k - 1}}{\vec{p}_{k}A\vec{p}_{k}} \\[20pt] \displaystyle\gamma_{k} = \frac{\vec{r}_{k}^{T}\vec{r}_{k}}{\vec{r}_{k - 1}^{T}\vec{r}_{k - 1}} \\[28pt] \vec{p}_{k + 1} = \vec{r}_{k} + \gamma_{k}\vec{p}_{k}. \end{array}$

### Proof.

The formula for $\vec{p}_{k + 1}$ comes from Lemma 6.

From Section 4d, we know that,

$\displaystyle\alpha_{k} = \frac{\vec{r}_{k - 1}^{T}\vec{p}_{k}}{\vec{p}_{k}^{T}A\vec{p}_{k}}, \,\,\,\,\,\,\gamma_{k} = -\frac{\vec{r}_{k}^{T}A\vec{p}_{k}}{\vec{p}_{k}^{T}A\vec{p}_{k}}, \,\,\,\,\,\,-A\vec{p}_{k} = \frac{\vec{r}_{k} - \vec{r}_{k - 1}}{\alpha_{k}}$.

By orthogonality, the numerator of $\alpha_{k}$ simplifies to:

$\vec{r}_{k - 1}^{T}\vec{p}_{k} \,=\, \vec{r}_{k - 1}^{T}\vec{r}_{k - 1} + \gamma_{k - 1}\underbrace{\vec{r}_{k - 1}^{T}\vec{p}_{k - 1}}_{=\,0} \,=\, \vec{r}_{k - 1}^{T}\vec{r}_{k - 1}.$

As for $\gamma_{k}$, note that,

$\displaystyle-\vec{r}_{k}^{T}A\vec{p}_{k} = \frac{\vec{r}_{k}^{T}\vec{r}_{k} - \vec{r}_{k}^{T}\vec{r}_{k - 1}}{\alpha_{k}} = \frac{\vec{r}_{k}^{T}\vec{r}_{k}}{\alpha_{k}}, \,\,\,\,\,\,\vec{p}_{k}^{T}A\vec{p}_{k} = \frac{\vec{r}_{k - 1}^{T}\vec{r}_{k - 1}}{\alpha_{k}}$.

## f. Summary

The pseudocode for CG looks like this:

% Initialize
x0 = zeros(n, 1), r0 = b, p1 = b;

for k = 1 : numIterations
% Sparse matrix-vector multiply
Ap1 = A * p1;

% Find alpha
alpha1 = (r0' * r0) / (p1' * Ap1);

% Find the iterate
x1 = x0 + alpha1 * p1;

% Find the residual
r1 = r0 - alpha1 * Ap1;

% Find gamma
gamma1 = (r1' * r1) / (r0' * r0);

% Find the search direction
p2 = r1 + gamma1 * p1;

% Update the iterates
x0 = x1;
r0 = r1;
p1 = p2;

end


Notice how we only need a sparse matrix-vector multiply, a vector-scalar multiply (axpy), and a dot product to solve a matrix equation. This is the power of iterative methods. We can make our code more efficient by reusing variables:

% Initialize
x = zeros(n, 1), r = b, p = b;
r_dot = r' * r;

for k = 1 : numIterations
% Sparse matrix-vector multiply
Ap = A * p;

% Find alpha and initialize gamma
alpha = r_dot / (p' * Ap);
gamma = r_dot;

% Find the iterate
x = x + alpha * p;

% Find the residual
r = r - alpha * Ap;
r_dot = r' * r;

% Find gamma
gamma = r_dot / gamma;

% Find the search direction
p = r + gamma * p;

end


# 5. Example

Let’s go back to 2D Poisson’s equation on a unit square domain, with homogeneous Dirichlet BC. Our RHS function is given by,

$f(x,\,y) = (a^{2} + b^{2})\pi^{2}\sin(a\pi x)\sin(b\pi y)$,

and we choose $a = 1$, $b = 9$, and $N = 200$ as before.

Stunningly, CG converges in 1 iteration! As far as iterative methods go, you can’t do better than that.

The question is how? The RHS function that I chose makes the Poisson’s equation an eigenfunction problem. Even at the discrete level, the RHS vector $\vec{f}$ happens to be an eigenvector of $K$:

$K\vec{f} = \lambda\vec{f}$.

As a result, in the first iteration,

$\begin{array}{l} \displaystyle\alpha_{1} \,=\, \frac{\vec{f}^{T}\vec{f}}{\vec{f}^{T}K\vec{f}} \,=\, \frac{1}{\lambda} \\[20pt] \displaystyle\vec{r}_{1} \,=\, \vec{f} - \alpha_{1}K\vec{f} \,=\, \vec{f} - \frac{1}{\lambda} \cdot \lambda\vec{f} \,=\, \vec{0}. \end{array}$

# Notes

A big thanks goes to Dr. Ye, one of my teachers at Kentucky. He instilled my passion for numerical linear algebra and applications thereof. To learn additional iterative methods, I recommend reading his paper linked below.

You can find the code in its entirety here: