In numerical linear algebra, we create ways for a computer to solve a linear system of equations . In doing so, we analyze how efficiently and accurately we can find the solution .
Perturbation theory concerns how much error we incur in the solution when we perturb (spoil) the data and . A classic statement tells us that the amount of error depends on the condition number of the matrix .
I will define and prove the statement, and help you understand it by “hearing” it.
1. Problem statement
Let’s consider the equation,
where is nonsingular and (otherwise, we get the trivial case ). Note that denotes the true solution. It’s what we hope to get.
Suppose, instead, we solve the perturbed system,
We seek to find a bound on the error . We expect the error to be small when and are small changes.
Given an induced matrix norm , we can define the condition number of :
Note that we always have . You can check why, along with the definition and properties of an induced matrix norm, in Notes.
We will show that, under a mild assumption ,
The RHS is a product of terms, so we can conclude two things. (1) If is a small number, then small relative errors in and will result in a small relative error in the solution . (2) If is a large number, however, we may get a large relative error in even when the relative errors in and are small.
We see that the condition number influences how much change occurs in the output when we change the input of a system. As a result, we say that the matrix is well-conditioned if is small, and ill-conditioned if is large.
2. Mathematical proof
a. Step 1
For any matrix with , the following statements hold:
(i) is nonsingular, and its inverse is given by
To prove (i), we check that and . We just need to check one of them because, in finite dimensions, injectivity and surjectivity occur together.
We see that,
Note, to be rigorous, we would first show that the infinite series converges. The sequence of finite sums, , is a Cauchy sequence in the norm . Since is a Banach space, i.e. complete, the sequence converges.
Let’s prove (ii). By triangle inequality and the properties of an induced matrix norm,
Next, we use the infinite geometric sum formula: For any real number ,
Since , we conclude that,
b. Step 2
Recall that is nonsingular. Assume further that,
Then, is also nonsingular.
Since is nonsingular, we can write as a product of two terms:
Hence, is nonsingular, if and only if, is nonsingular. Let’s show that the latter is true by applying (i) from Step 1.
We find that,
Hence, is nonsingular.
Before we move on, let’s read the statement in Step 2 again. It tells us that, if a matrix is nonsingular, then there are infinitely many matrices nearby that are nonsingular, too. (And infinitely many around them, and so on.) This supports the fact that there are far more nonsingular matrices than singular ones. Isn’t that a marvel?
c. Step 3
We’re almost there! Recall the original problems:
Subtract the two equations to get,
Take the vector norm to both sides:
Finally, we divide both sides by .
d. Step 4
Finally, prove that,
Let’s find an upper bound for in Step 3. We use (ii) from Step 1:
We can represent an audio as a vector of frequencies. The provided audio file, williams.wav, lends to . At a sampling rate of 44100 Hz, the audio is 4.3068 seconds long.
For computational efficiency, we will write as a matrix , by storing the first 10 entries of as the first column of , the next 10 entries as the second, and so on.
Next, we apply a nonsingular matrix to to get the encrypted audio . We can always vectorize into by reversing the storage process described above.
From now on, we assume that we only have and . We seek to decrypt the audio, i.e. solve the equation , when there are perturbations in the matrix and/or in the input :
a. Generating matrices
To generate a well-conditioned , we find the QR factorization of a random matrix and set . We know that the 2-norm of an orthogonal matrix is always 1. Hence,
For an ill-conditioned , we use the Hilbert matrix :
Then, . Note that both and are nonsingular.
b. Generating perturbations
The entries of perturbations and are randomly generated from a scaled normal distribution. For convenience, we will assume that is (most likely) nonsingular because is randomly chosen.
We are more interested in ensuring that the values of and stay about the same each time we run the program.
We can run perturbation_theory.m under 6 different cases:
The table lists the input parameters for each case.
First, let’s consider what happens when is well-conditioned.
These plots show that the obtained solution matches the true solution well. The fidelity of the obtained audios remains largely pristine. We do hear some added noise. In the 100 ms sample, the relative error stays around 2% (median) when we perturb only the matrix . When we perturb the encrypted audio , the relative error jumps to about 10%.
Download and listen to the obtained audios:
- audio_obtained11.wav (well-conditioned , perturbation to only)
- audio_obtained12.wav (well-conditioned , perturbation to only)
- audio_obtained13.wav (well-conditioned , perturbation to both and )
Next, let’s consider what happens when is ill-conditioned.
This time, the obtained solution hardly matches the true solution. You can hear much more noise in the obtained audios, and listening to them becomes almost unbearable. The relative error is orders of magnitude larger. It’s interesting how perturbing only results in a completely garbled audio, but perturbing in addition reconstructs some of the original audio. Two wrongs do make a right, it seems.
Download and listen to the obtained audios:
- audio_obtained21.wav (ill-conditioned , perturbation to only)
- audio_obtained22.wav (ill-conditioned , perturbation to only)
- audio_obtained23.wav (ill-conditioned , perturbation to both and )
We can also check that our matrices satisfy the statement that we painstakingly proved. I will leave writing the extra code to you.
Given a vector norm for the vector space , we can always create a matrix norm for the vector space . As a result, we call this an “induced” matrix norm. For simplicity, I use the double bar notation for both types of norms. It is clear from context whether we are looking at the vector norm or the induced matrix norm.
The induced matrix norm of is defined as,
Hence, the induced matrix norm measures how far out we can map a vector relative to its original size. We can also generalize the definition to rectangular matrices.
Because of its definition, the induced matrix norm satisfies these useful properties:
(i) For any and ,
(ii) For any ,
In particular, property (ii) implies that the condition number of a nonsingular matrix is always at least 1.
You can find the code and audio files in their entirety here:
Download from GitHub