# Hearing Perturbation Theory

In numerical linear algebra, we create ways for a computer to solve a linear system of equations $A\vec{x} = \vec{b}$. In doing so, we analyze how efficiently and accurately we can find the solution $\vec{x}$.

Perturbation theory concerns how much error we incur in the solution $\vec{x}$ when we perturb (spoil) the data $A$ and $\vec{b}$. A classic statement tells us that the amount of error depends on the condition number of the matrix $A$.

I will define and prove the statement, and help you understand it by “hearing” it.

# 1. Problem statement

Let’s consider the equation,

$A\vec{x}_{true} = \vec{b}$,

where $A \in \mathbb{R}^{n \times n}$ is nonsingular and $\vec{b} \neq \vec{0}$ (otherwise, we get the trivial case $\vec{x}_{true} = \vec{0}$). Note that $\vec{x}_{true}$ denotes the true solution. It’s what we hope to get.

Suppose, instead, we solve the perturbed system,

$(A + \delta\!A)\vec{x} = \vec{b} + \vec{\delta b}$.

We seek to find a bound on the error $\vec{x} - \vec{x}_{true}$. We expect the error to be small when $\delta\!A$ and $\vec{\delta b}$ are small changes.

Given an induced matrix norm $||\cdot||$, we can define the condition number of $A$:

$\kappa(A) := ||A|| \cdot ||A^{-1}||$.

Note that we always have $\kappa(A) \geq 1$. You can check why, along with the definition and properties of an induced matrix norm, in Notes.

We will show that, under a mild assumption $\displaystyle\kappa(A) \cdot \frac{||\delta\!A||}{||A||} < 1$,

$\boxed{\displaystyle\frac{||\vec{x} - \vec{x}_{true}||}{||\vec{x}_{true}||} \leq \frac{\kappa(A)}{1 - \kappa(A)\frac{||\delta\!A||}{||A||}} \,\cdot \left(\frac{||\delta\!A||}{||A||} + \frac{||\vec{\delta b}||}{||\vec{b}||}\right)}$.

The RHS is a product of terms, so we can conclude two things. (1) If $\kappa(A)$ is a small number, then small relative errors in $A$ and $\vec{b}$ will result in a small relative error in the solution $\vec{x}_{true}$. (2) If $\kappa(A)$ is a large number, however, we may get a large relative error in $\vec{x}_{true}$ even when the relative errors in $A$ and $\vec{b}$ are small.

We see that the condition number influences how much change occurs in the output when we change the input of a system. As a result, we say that the matrix $A$ is well-conditioned if $\kappa(A)$ is small, and ill-conditioned if $\kappa(A)$ is large.

# 2. Mathematical proof

## a. Step 1

For any matrix $X \in \mathbb{R}^{n \times n}$ with $||X|| < 1$, the following statements hold:

(i) $I - X$ is nonsingular, and its inverse is given by

$\displaystyle(I - X)^{-1} = \sum_{i\,=\,0}^{\infty}\,X^{i} = I + X + X^{2} + \cdots$.

(ii) $\displaystyle||(I - X)^{-1}|| \leq \frac{1}{1 - ||X||}$.

To prove (i), we check that $(I - X)(I - X)^{-1} = I$ and $(I - X)^{-1}(I - X) = I$. We just need to check one of them because, in finite dimensions, injectivity and surjectivity occur together.

We see that,

\begin{aligned} (I - X)(I - X)^{-1} &= (I - X)(I + X + X^{2} + \cdots) \\[12pt] &= (I + X + X^{2} + \cdots) - (X + X^{2} + X^{3} + \cdots) \\[12pt] &= I. \end{aligned}

Note, to be rigorous, we would first show that the infinite series $\sum_{i\,=\,0}^{\infty}\,X^{i}$ converges. The sequence of finite sums, $\left\{\sum_{i\,=\,0}^{k}\,X^{i}\right\}_{k\,=\,0}^{\infty}$, is a Cauchy sequence in the norm $||\cdot||$. Since $\mathbb{R}^{n \times n}$ is a Banach space, i.e. complete, the sequence converges.

Let’s prove (ii). By triangle inequality and the properties of an induced matrix norm,

\begin{aligned} ||(I - X)^{-1}|| &= ||I + X + X^{2} + \cdots|| \\[12pt] &\leq ||I|| + ||X|| + ||X^{2}|| + \cdots \\[12pt] &\leq ||I|| + ||X|| + ||X||^{2} + \cdots. \end{aligned}

Next, we use the infinite geometric sum formula: For any real number $r \in (-1,\,1)$,

$\displaystyle\sum_{i\,=\,0}^{\infty}\,r^{i} = \frac{1}{1 - r}$.

Since $||X|| \in [0,\,1)$, we conclude that,

$\displaystyle||(I - X)^{-1}|| \leq \frac{1}{1 - ||X||}$.

## b. Step 2

Recall that $A$ is nonsingular. Assume further that,

$\displaystyle\kappa(A) \cdot \frac{||\delta\!A||}{||A||} < 1$.

Then, $A + \delta\!A$ is also nonsingular.

Since $A$ is nonsingular, we can write $A + \delta\!A$ as a product of two terms:

$A + \delta\!A = A(I + A^{-1}\delta\!A)$.

Hence, $A + \delta\!A$ is nonsingular, if and only if, $I + A^{-1}\delta\!A$ is nonsingular. Let’s show that the latter is true by applying (i) from Step 1.

We find that,

\displaystyle\begin{aligned} ||A^{-1}\delta\!A|| &\leq ||A^{-1}|| \cdot ||\delta\!A|| \\[12pt] &= \kappa(A)\frac{||\delta\!A||}{||A||} \\[12pt] &< 1. \end{aligned}

Hence, $I + A^{-1}\delta\!A$ is nonsingular.

Before we move on, let’s read the statement in Step 2 again. It tells us that, if a matrix is nonsingular, then there are infinitely many matrices nearby that are nonsingular, too. (And infinitely many around them, and so on.) This supports the fact that there are far more nonsingular matrices than singular ones. Isn’t that a marvel?

## c. Step 3

Show that,

$\displaystyle \frac{||\vec{x} - \vec{x}_{true}||}{||\vec{x}_{true}||} \leq ||(A + \delta\!A)^{-1}|| \,\cdot \left(||\delta\!A|| + \frac{||\vec{\delta b}||}{||\vec{x}_{true}||}\right)$.

We’re almost there! Recall the original problems:

$\begin{array}{rcl} A\vec{x}_{true} & = & \vec{b} \\[12pt] (A + \delta\!A)\vec{x} & = & \vec{b} + \vec{\delta b}. \end{array}$

Subtract the two equations to get,

$\begin{array}{ll} & (A + \delta\!A)\vec{x} - A\vec{x}_{true} = \vec{\delta b} \\[12pt] \Rightarrow & (A + \delta\!A)(\vec{x} - \vec{x}_{true}) = \vec{\delta b} - \delta\!A\vec{x}_{true} \\[12pt] \Rightarrow & \vec{x} - \vec{x}_{true} = (A + \delta\!A)^{-1}(\vec{b} - \delta\!A\vec{x}_{true}). \end{array}$

Take the vector norm to both sides:

\begin{aligned} ||\vec{x} - \vec{x}_{true}|| &= ||(A + \delta\!A)^{-1}(\vec{b} - \delta\!A\vec{x}_{true})|| \\[12pt] &\leq ||(A + \delta\!A)^{-1}|| \cdot ||\vec{b} - \delta\!A\vec{x}_{true}|| \\[12pt] & \leq ||(A + \delta\!A)^{-1}|| \cdot \Bigl(||\vec{b}|| + ||\delta\!A|| \cdot ||\vec{x}_{true}||\Bigr). \end{aligned}

Finally, we divide both sides by $||\vec{x}_{true}||\,(\neq 0)$.

## d. Step 4

Finally, prove that,

$\displaystyle\frac{||\vec{x} - \vec{x}_{true}||}{||\vec{x}_{true}||} \leq \frac{\kappa(A)}{1 - \kappa(A)\frac{||\delta\!A||}{||A||}} \,\cdot \left(\frac{||\delta\!A||}{||A||} + \frac{||\vec{\delta b}||}{||\vec{b}||}\right)$.

Let’s find an upper bound for $||(A + \delta\!A)^{-1}||$ in Step 3. We use (ii) from Step 1:

\begin{aligned} ||(A + \delta\!A)^{-1}|| &\leq ||(I + A^{-1}\delta\!A)^{-1}|| \cdot ||A^{-1}|| \\[12pt] &\leq \frac{||A^{-1}||}{1 - ||A^{-1}\delta\!A||} \\[12pt] &\leq \frac{||A^{-1}||}{1 - \kappa(A)\frac{||\delta\!A||}{||A||}}. \end{aligned}

Hence,

\begin{aligned} \frac{||\vec{x} - \vec{x}_{true}||}{||\vec{x}_{true}||} &\leq \frac{||A|| \cdot ||A^{-1}||}{1 - \kappa(A)\frac{||\delta\!A||}{||A||}} \,\cdot \left(\frac{||\delta\!A||}{||A||} + \frac{||\vec{\delta b}||}{||A|| \cdot ||\vec{x}_{true}||}\right) \\[12pt] &\leq \frac{\kappa(A)}{1 - \kappa(A)\frac{||\delta\!A||}{||A||}} \,\cdot \left(\frac{||\delta\!A||}{||A||} + \frac{||\vec{\delta b}||}{||\vec{b}||}\right). \end{aligned}

# 3. Application

We can represent an audio as a vector $\vec{x}_{true} \in \mathbb{R}^{n}$ of frequencies. The provided audio file, williams.wav, lends to $n = 189930$. At a sampling rate of 44100 Hz, the audio is 4.3068 seconds long.

For computational efficiency, we will write $\vec{x}_{true}$ as a matrix $X_{true} \in \mathbb{R}^{10 \times 18993}$, by storing the first 10 entries of $\vec{x}_{true}$ as the first column of $X_{true}$, the next 10 entries as the second, and so on.

Next, we apply a nonsingular matrix $A \in \mathbb{R}^{10 \times 10}$ to $X_{true}$ to get the encrypted audio $B \in \mathbb{R}^{10 \times 18993}$. We can always vectorize $B$ into $\vec{b} \in \mathbb{R}^{189930}$ by reversing the storage process described above.

From now on, we assume that we only have $A$ and $B$. We seek to decrypt the audio, i.e. solve the equation $AX_{true} = B$, when there are perturbations in the matrix $A$ and/or in the input $B$:

$(A + \delta\!A)X = (B + \delta\!B)$.

## a. Generating matrices

To generate a well-conditioned $A$, we find the QR factorization of a random $10 \times 10$ matrix and set $A = Q$. We know that the 2-norm of an orthogonal matrix is always 1. Hence,

$\kappa_{2}(Q) = ||Q||_{2} \cdot ||Q^{T}||_{2} = 1$.

For an ill-conditioned $A$, we use the Hilbert matrix $H$:

$H = \left[\begin{array}{ccccc} 1 & \frac{1}{2} & \cdots & \frac{1}{9} & \frac{1}{10} \\[12pt] \frac{1}{2} & \frac{1}{3} & \cdots & \frac{1}{10} & \frac{1}{11} \\[12pt] \vdots & \vdots & \ddots & \vdots & \vdots \\[12pt] \frac{1}{9} & \frac{1}{10} & \cdots & \frac{1}{17} & \frac{1}{18} \\[12pt] \frac{1}{10} & \frac{1}{11} & \cdots & \frac{1}{18} & \frac{1}{19} \end{array}\right]$.

Then, $\kappa_{2}(H) \approx 1.6025 \times 10^{13}$. Note that both $Q$ and $H$ are nonsingular.

## b. Generating perturbations

The entries of perturbations $\delta\!A$ and $\delta\!B$ are randomly generated from a scaled normal distribution. For convenience, we will assume that $A + \delta\!A$ is (most likely) nonsingular because $\delta\!A$ is randomly chosen.

We are more interested in ensuring that the values of $||\delta\!A||_{2}$ and $||\vec{\delta b}||_{2}$ stay about the same each time we run the program.

## c. Simulations

We can run perturbation_theory.m under 6 different cases:

$\begin{tabular}{c|c|c|c} & Perturb A only & Perturb B only & Perturb A and B \\[8pt]\hline Well-conditioned A & (1,\,1) & (1,\,2) & (1,\,3) \\[8pt]\hline Ill-conditioned A & (2,\,1) & (2,\,2) & (2,\,3) \end{tabular}$

The table lists the input parameters for each case.

First, let’s consider what happens when $A$ is well-conditioned.

This slideshow requires JavaScript.

These plots show that the obtained solution $\vec{x}$ matches the true solution $\vec{x}_{true}$ well. The fidelity of the obtained audios remains largely pristine. We do hear some added noise. In the 100 ms sample, the relative error stays around 2% (median) when we perturb only the matrix $A$. When we perturb the encrypted audio $B$, the relative error jumps to about 10%.

Download and listen to the obtained audios:

• audio_obtained11.wav (well-conditioned $A$, perturbation to $A$ only)
• audio_obtained12.wav (well-conditioned $A$, perturbation to $B$ only)
• audio_obtained13.wav (well-conditioned $A$, perturbation to both $A$ and $B$)

Next, let’s consider what happens when $A$ is ill-conditioned.

This slideshow requires JavaScript.

This time, the obtained solution hardly matches the true solution. You can hear much more noise in the obtained audios, and listening to them becomes almost unbearable. The relative error is orders of magnitude larger. It’s interesting how perturbing $B$ only results in a completely garbled audio, but perturbing $A$ in addition reconstructs some of the original audio. Two wrongs do make a right, it seems.

Download and listen to the obtained audios:

• audio_obtained21.wav (ill-conditioned $A$, perturbation to $A$ only)
• audio_obtained22.wav (ill-conditioned $A$, perturbation to $B$ only)
• audio_obtained23.wav (ill-conditioned $A$, perturbation to both $A$ and $B$)

We can also check that our matrices satisfy the statement that we painstakingly proved. I will leave writing the extra code to you.

# Notes

Given a vector norm $||\cdot||$ for the vector space $\mathbb{R}^{n}$, we can always create a matrix norm $||\cdot||$ for the vector space $\mathbb{R}^{n \times n}$. As a result, we call this an “induced” matrix norm. For simplicity, I use the double bar notation for both types of norms. It is clear from context whether we are looking at the vector norm or the induced matrix norm.

The induced matrix norm of $A \in \mathbb{R}^{n \times n}$ is defined as,

$\displaystyle||A|| := \max\limits_{\vec{x}\,\neq\,\vec{0}}\,\frac{||A\vec{x}||}{||\vec{x}||} = \max\limits_{||\vec{x}||\,=\,1}\,||A\vec{x}||$.

Hence, the induced matrix norm measures how far out we can map a vector relative to its original size. We can also generalize the definition to rectangular matrices.

Because of its definition, the induced matrix norm satisfies these useful properties:

(i) For any $A \in \mathbb{R}^{n \times n}$ and $\vec{x} \in \mathbb{R}^{n}$,

$||A\vec{x}|| \leq ||A|| \cdot ||\vec{x}||$.

(ii) For any $A,\,B \in \mathbb{R}^{n \times n}$,

$||AB|| \leq ||A|| \cdot ||B||$.

In particular, property (ii) implies that the condition number of a nonsingular matrix is always at least 1.

$AA^{-1} = I \,\,\Rightarrow\,\, ||AA^{-1}|| = 1 \,\,\Rightarrow\,\, ||A|| \cdot ||A^{-1}|| \geq 1$.

You can find the code and audio files in their entirety here: