Today we finish 4.4 and cover the Markov chains part of Section 4.6. Not covering Section 4.5, or rest of 4.6 (which contains many interesting applications!) Read Section 5.1 for Monday. Work through recommended homework questions. Some updated!
Tutorials: Quiz 6 is next week. It covers Sections 4.4 and
the Markov Chains part of 4.6.
Office hour: Monday, 1:30-2:30, MC103B.
Help Centers: Monday-Friday 2:30-6:30 in MC 106.
Definition: Let $A$ and $B$ be $n \times n$ matrices. We say that $A$ is similar to $B$ if there is an invertible matrix $P$ such that $P^{-1} A P = B$. When this is the case, we write $A \sim B$.
Theorem 4.22:
Let $A$ and $B$ be similar matrices. Then:
a. $\det A = \det B$
b. $A$ is invertible iff $B$ is invertible.
c. $A$ and $B$ have the same rank.
d. $A$ and $B$ have the same characteristic polynomial.
e. $A$ and $B$ have the same eigenvalues.
Definition: $A$ is diagonalizable if it is similar to some diagonal matrix.
If $A$ is similar to a diagonal matrix $D$, then $D$ must have the eigenvalues of $A$ on the diagonal. But how to find $P$?
More precisely, there exist an invertible matrix $P$ and a diagonal matrix $D$ with $P^{-1}AP = D$ if and only if the columns of $P$ are $n$ linearly independent eigenvectors of $A$ and the diagonal entries of $D$ are the corresponding eigenvalues in the same order.
This theorem is one of the main reasons we want to be able to find eigenvectors of a matrix. Moreover, the more eigenvectors the better, so this motivates allowing complex eigenvectors.
Theorem 4.24: If $\lambda_1, \ldots, \lambda_k$ are distinct eigenvalues of $A$ and, for each $i$, $\cB_i$ is a basis for the eigenspace $E_{\lambda_i}$, then the union of the $\cB_i$'s is a linearly independent set.
Combining Theorems 4.23 and 4.24 gives the following important consequence:
Theorem: An $n \times n$ matrix is diagonalizable if and only if the sum of the geometric multiplicities of the eigenvalues is $n$.
In particular:
Theorem 4.25: If $A$ in an $n \times n$ matrix with $n$ distinct eigenvalues, then $A$ is diagonalizable.
So it is important to understand the geometric multiplicities better. Here is a helpful result:
Lemma 4.26: If $\lambda_1$ is an eigenvalue of an $n \times n$ matrix $A$, then $$ \text{geometric multiplicity of } \lambda_1 \leqslant \text{algebraic multiplicity of } \lambda_1 $$
We'll prove this in a minute. First, let's look at what it implies:
Let $A$ be an $n \times n$ matrix with distinct eigenvalues $\lambda_1, \lambda_2, \ldots, \lambda_k$. Let their geometric multiplicities be $g_1, g_2, \ldots, g_k$ and their algebraic multiplicities be $a_1, a_2, \ldots, a_k$. We know $$ g_i \leqslant a_i \qtext{for each $i$} $$ and so $$ g_1 + \cdots + g_k \leqslant a_1 + \cdots + a_k \leqslant n $$ So the only way to have $g_1 + \cdots + g_k = n$ is to have $g_i = a_i$ for each $i$ and $a_1 + \cdots + a_k = n$.
This gives the main theorem of the section:
Theorem 4.27 (The Diagonalization Theorem):
Let $A$ be an $n \times n$ matrix with distinct eigenvalues
$\lambda_1, \lambda_2, \ldots, \lambda_k$.
Let their geometric multiplicities be $g_1, g_2, \ldots, g_k$
and their algebraic multiplicities be $a_1, a_2, \ldots, a_k$.
Then the following are equivalent:
a. $A$ is diagonalizable.
b. $g_1 + \cdots + g_k = n$.
c. $g_i = a_i$ for each $i$ and $a_1 + \cdots + a_k = n$.
Note: This is stated incorrectly in the text. The red part must be added unless you are working over $\C$, in which case it is automatic that $a_1 + \cdots + a_k = n$. With the way I have stated it, it is correct over $\R$ or over $\C$.
We still need to prove:
Lemma 4.26: If $\lambda_1$ is an eigenvalue of an $n \times n$ matrix $A$, then $$ \text{geometric multiplicity of } \lambda_1 \leqslant \text{algebraic multiplicity of } \lambda_1 $$
Proof (more direct than in text): Suppose that $\lambda_1$ is an eigenvalue of $A$ with geometric multiplicity $g$, and let $\vv_1, \ldots, \vv_g$ be a basis for $E_{\lambda_1}$, so $$A \vv_i = \lambda_1 \vv_i \qtext{for each $i$}$$ Let $Q$ be an invertible matrix whose first $g$ columns are $\vv_1, \ldots, \vv_g$: $$ Q = [ \, \vv_1 \, \cdots \, \vv_g \qtext{other vectors} ] $$ Since $Q^{-1} Q = I$, we know that $Q^{-1} \vv_i = \ve_i$ for $1 \leqslant i \leqslant g$. Also, the first $g$ columns of $A Q$ are $\lambda_1 \vv_1, \ldots, \lambda_1 \vv_g$. So the first $g$ columns of $Q^{-1} A Q$ are $\lambda_1 \ve_1, \ldots, \lambda_1 \ve_g$. Therefore the matrix $Q^{-1} A Q$ has $\lambda_1$ as an eigenvalue with algebraic multiplicity at least $g$. But $Q^{-1} A Q$ has the same characteristic polynomial as $A$, so $\lambda_1$ must also have algebraic multiplicity at least $g$ for $A$.$\quad\Box$.
Steps:
1. Compute the characteristic polynomial $\det(A - \lambda I)$ of $A$.
2. Find the roots of the characteristic polynomial and their algebraic
multiplicities by factoring.
3. If the algebraic multiplicities don't add up to $n$, then $A$ is
not diagonalizable, and you can stop. (If you are working over $\C$,
this can't happen.)
4. For each eigenvalue $\lambda$, compute the dimension of the
eigenspace $E_\lambda$. This is the geometric multiplicity of
$\lambda$, and if it is less than the algebraic multiplicity,
then $A$ is not diagonalizable, and you can stop.
5. Compute a basis for the eigenspace $E_\lambda$.
6. If for each eigenvalue the geometric multiplicity equals the
algebraic multiplicity, then you take the $n$ eigenvectors you
found and put them in the columns of a matrix $P$. Put the
eigenvalues in the same order on the diagonal of a matrix $D$.
7. Check that $AP=PD$.
Note that step 4 only requires you to find the row echelon form of $A - \lambda I$, as the number of free variables here is the geometric multiplicity. In step 5, you solve the system.
More generally, $A^k = P D^k P^{-1}$. This is clearly an efficient way to compute powers! Note that we need to know $P$, not just $D$, to do this.
See Example 4.29 for a sample calculation. We'll illustrate this result with an example from Markov Chains.
Since you must transition to some state, $P_{1j} + \cdots + P_{nj} = 1$. That is, the entries in each column sum to 1. Moreover, each entry $P_{ij} \geqslant 0$. Such a $P$ is called a stochastic matrix.
We can represent the current state of the system with a state vector $\vx \in \R^n$. The $i$th entry of $\vx$ may denote the number of people/objects in state $i$. Or we may divide by the total number, so the $i$th entry of $\vx$ gives the fraction of people/objects in state $i$. In this case, $\vx$ has non-negative entries that sum to 1 and is called a probability vector.
If $\vx_k$ denotes the state after $k$ time steps, then the state after one more time step is given by $$ \vx_{k+1} = P \vx_k . $$ It follows that $\vx_k = P^k \vx_0$. Therefore:
The $ij$ entry $(P^k)_{ij}$ of $P^k$ is the probability of going from state $\red{j}$ to state $\red{i}$ in $k$ steps.
A state $\vx$ such that $P \vx = \vx$ is called a steady state vector. This is the same as an eigenvector with eigenvalue 1. In Lecture 22, we proved:
Theorem 4.30: Every stochastic matrix has a steady state vector, i.e. it has $\lambda = 1$ as an eigenvalue.
We proved this using the fact that $P$ and $P^T$ have the same eigenvalues, and then noticing that the vector with all $1$'s is an eigenvector of $P^T$ with eigenvalue 1.
Example: We studied toothpaste usage, and had transition matrix $$ P = \bmat{ll} 0.70 & 0.20 \\ 0.30 & 0.80 \emat . $$ We noticed experimentally that a given starting state tends to the state $\coll {0.4} {0.6}$ and that $$ \bmat{ll} 0.70 & 0.20 \\ 0.30 & 0.80 \emat \coll {0.4} {0.6} = \coll {0.4} {0.6} . $$ We then found this steady state vector algebraically by solving $(I-P)\vx = \vec 0$. [It is equivalent to solve $(P-I)\vx = \vec 0$.]
With our new tools, we can go further now.
This is a very general phenomenon:
If in addition the entries of $P$ are all positive, then all eigenvalues besides $\lambda = 1$ have $|\lambda| < 1$.
The other part of the Theorem is similar.
This theorem helps us understand the long-term behaviour:
Theorem 4.33: Let $P$ be an $n \times n$ stochastic matrix all of whose entries are positive. Then as $k \to \infty$, $P^k \to L$, a matrix all of whose columns are equal to the same vector $\vx$ which is a steady state probability vector for $P$.
Proof: We'll assume $P$ is diagonalizable: $Q^{-1} P Q = D$. So $P^k = Q D^k Q^{-1}$. As $k \to \infty$, $D^k$ approaches a matrix $D^*$ with $1$'s and $0$'s on the diagonal (by Theorem 4.31), which means that $P^k$ approaches $L = Q D^* Q^{-1}$.
Now that we know that $P^k$ has some limit $L$, we can deduce something about it. Since $\lim_{k \to \infty} P^k = L$, we have $$ PL = P\lim_{k \to \infty} P^k = \lim_{k \to \infty} P^{k+1} = L $$ This means that the columns of $L$ must be steady-state vectors for $P$. Since the columns of $P^k$ are probability vectors, the same must be true of the columns of $L$. It's not hard to show that $P$ has a unique steady-state probability vector $\vx$, so $L = [\, \vx \, \vx \, \cdots \, \vx \, ]$, as required.$\quad\Box$
Finally, we can deduce that Markov chains tend to their steady states:
Theorem 4.34: Let $P$ be an $n \times n$ stochastic matrix all of whose entries are positive, and let $\vx_0$ be any initial probability vector. Then as $k \to \infty$, $\vx_k \to \vx$, where $\vx$ is the steady state probability vector for $P$.
Proof: Suppose that $\vx_0$ has components $x_1, x_2, \ldots, x_n$. Then $$ \begin{aligned} \lim_{k \to \infty} \vx_k &= \lim_{k \to \infty} P^k \vx_0 = L \vx_0 \\ &= [\, \vx \, \vx \, \cdots \, \vx \, ] \vx_0 \\ &= x_1 \vx + x_2 \vx + \cdots + x_n \vx \\ &= (x_1 + x_2 + \cdots + x_n ) \vx = \vx \quad\Box \end{aligned} $$
This result works both ways: if you compute the eigenvector with eigenvalue 1, that tells you the steady-state vector that other states go to as $k \to \infty$. But it also means that if you don't know the steady-state vector, you can approximate it by starting with any vector $\vx_0$ and computing $P^k \vx_0$ for large $k$!
The latter is what Google does to compute the page rank eigenvector.