Characteristic Polynomials and the Cayley–Hamilton Theorem [Linear Algebra I Studied #18]

Picking up right where we left off.

The big question we’ve been chasing: “can every matrix be diagonalized?!?!?!?!”

Right now we’re setting up an nth-degree equation from a determinant to find eigenvalues, and… yeah, common sense says there are gonna be times when we don’t get n eigenvalues out of it!!!

So which matrices get diagonalized, and which don’t?

①

$$\det(A - \lambda I) = 0$$

The first failure mode: “what if the nth-degree equation just has no solution?” (i.e., we can’t find eigenvalues at all)

$$\det(A - \lambda I)$$

Anyway,

$$\det(A - \lambda I) = 0$$

Since we’re hunting for solutions to this equation,

let me slap a definition on it and call it the characteristic polynomial of $A$:

$$p(\lambda) = \det(A - \lambda I)$$

OK so.

$$p(\lambda) = \det(A - \lambda I)$$

That’s our definition.

If $\lambda$ is an eigenvalue, then

$$p(\lambda) = 0$$

That’s the deal.

$$A = \begin{pmatrix} 0 & -1 \\ 1 & 0 \end{pmatrix}$$

then for the eigenvalue we get

$$p(\lambda) = \lambda^2 + 1 = 0$$

Yep!!

Hold on, let me actually try the example.

$$p(\lambda) = \lambda^2 + 1 = 0$$

This is what we get.

Whoa — there’s no solution…..

Ah… well, “no solution” is a bit of a stretch. The reason is, there’s no solution in the reals, but if we let ourselves play in the complex numbers, we do get solutions — the imaginary roots.

So in other words: over $\mathbb{R}$, this matrix is not diagonalizable.

Extend to $\mathbb{C}$ though, and

$$\lambda = i, \quad \lambda = -i$$$$\lambda_1 = i, \quad \lambda_2 = -i$$

we can fish out the eigenvalues,

and the eigenvectors that go with them are

$$x^{(1)} = \begin{pmatrix} 1 \\ -i \end{pmatrix}, \quad x^{(2)} = \begin{pmatrix} 1 \\ i \end{pmatrix}$$

Like so,

and if we form $R$,

$$R = \begin{pmatrix} 1 & 1 \\ -i & i \end{pmatrix}$$

then diagonalizing,

$$R^{-1}AR = \begin{pmatrix} i & 0 \\ 0 & -i \end{pmatrix}$$

works out fine.

Heads up!!

Any complex polynomial can — no exceptions — be factored into linear pieces.

That’s the Fundamental Theorem of Algebra.

So as you can see,

the answer flips depending on which field you take as your domain.

Up to now we’ve been pretty cavalier about which field we’re doing linear algebra over —

we just defaulted to $\mathbb{R}$ like it was the obvious choice.

From here on out, we’re gonna have to actually pay attention to which field we’re playing in.

Anyway — sometimes eigenvalues just don’t exist…..

② What if eigenvalues exist but we can’t find their eigenvectors?

Could that ever happen?!?!?!

$$\lambda_1, \lambda_2, \ldots, \lambda_n$$

OK, so for each

$$\lambda_i$$

we need to dig up the corresponding

$$x^{(i)}$$

But —

$$(A - \lambda_i I)x^{(i)} = 0$$

is there always at least one nonzero vector satisfying this thing?!?!?!?

Answer: Yes!!!!!

$$(A - \lambda_i I)x = 0$$

Since this is the equation,

look at the dimension of the column space of

$$(A - \lambda_i I)$$

— i.e., the dimension of the image!!

$$\dim(\text{Im}(A - \lambda_i I))$$

No matter how big it gets, it’s at most $n-1$. ($\leq n-1$)

Why~~~~~

$$\det(A - \lambda_i I) = 0$$

Because the image of this matrix can never hit dimension $n$.

If $\dim(\text{Im}) = n$, that’d mean the matrix is invertible.

But then

$$\det(A - \lambda_i I) = 0$$

couldn’t possibly hold, right???

So by the Dimension Theorem,

$$(A - \lambda_i I)$$

the dimension of its kernel must — no exceptions — be at least 1, you see?!?!?!

And what does kernel-dimension $\geq 1$ mean?

$$(A - \lambda_i I)$$

It means there’s at least one vector that, when you feed it into this matrix, gets sent to 0.

Therefore!!!~~~

$$(A - \lambda_i I)x^{(i)} = 0$$

there’s always at least one $x^{(i)}$ satisfying this!!!!

OK, and let’s verify one more thing, just for kicks.

$$x^{(1)}, x^{(2)}, \ldots, x^{(n)}$$

If we collect these into a matrix $R$,

$$R = \begin{pmatrix} x^{(1)} & x^{(2)} & \cdots & x^{(n)} \end{pmatrix}$$

let’s check: does $R$ really always come out invertible?

So set up the assumption.

“Suppose $R$ is not invertible.”

If $R$ is not invertible, then

$$x^{(1)}, x^{(2)}, \ldots, x^{(n)}$$

these are linearly dependent!!

Which means

$$c_1 x^{(1)} + c_2 x^{(2)} + \cdots + c_n x^{(n)} = 0$$

with $c_1, \ldots, c_n$ not all zero, and

$$c_1, c_2, \ldots, c_n$$

among these, take just the nonzero ones —

$$c_1, c_2, \ldots, c_k$$

I’ll pick $k$ of them (and relabel them $1$ through $k$).

Toss the zeros, keep what’s left:

$$c_1 x^{(1)} + c_2 x^{(2)} + \cdots + c_k x^{(k)} = 0$$

Now multiply both sides by $A$ on the left.

$$A(c_1 x^{(1)} + c_2 x^{(2)} + \cdots + c_k x^{(k)}) = 0$$

Whoa,

$$c_1 \lambda_1 x^{(1)} + c_2 \lambda_2 x^{(2)} + \cdots + c_k \lambda_k x^{(k)} = 0$$

we get this,

because the eigenvalue equation $Ax^{(i)} = \lambda_i x^{(i)}$ holds for all $k$ vectors.

Multiply by $A$ again.

$$c_1 \lambda_1^2 x^{(1)} + c_2 \lambda_2^2 x^{(2)} + \cdots + c_k \lambda_k^2 x^{(k)} = 0$$

Keep~~~ multiplying.

$$c_1 \lambda_1^{k-1} x^{(1)} + c_2 \lambda_2^{k-1} x^{(2)} + \cdots + c_k \lambda_k^{k-1} x^{(k)} = 0$$

So now we’ve cooked up $k$ equations.

Now let’s gather those $k$ $x^{(i)}$’s into matrix form, call it $B$:

$$B = \begin{pmatrix} x^{(1)} & x^{(2)} & \cdots & x^{(k)} \end{pmatrix}$$

That’s what I mean.

You see what the right-hand side is doing?!?!…..?

Now express those $k$ equations using $B$:

$$B \begin{pmatrix} c_1 \\ c_2 \\ \vdots \\ c_k \end{pmatrix} = 0, \quad B \begin{pmatrix} c_1 \lambda_1 \\ c_2 \lambda_2 \\ \vdots \\ c_k \lambda_k \end{pmatrix} = 0, \quad \ldots$$

Like that, right???????

Below I jotted down a little extra explanation.

$$\begin{pmatrix} c_1 \\ c_2 \lambda_2 \\ \vdots \\ c_k \lambda_k^{k-1} \end{pmatrix} = \begin{pmatrix} 1 & 1 & \cdots & 1 \\ \lambda_1 & \lambda_2 & \cdots & \lambda_k \\ \vdots & \vdots & \ddots & \vdots \\ \lambda_1^{k-1} & \lambda_2^{k-1} & \cdots & \lambda_k^{k-1} \end{pmatrix} \begin{pmatrix} c_1 \\ c_2 \\ \vdots \\ c_k \end{pmatrix}$$

For the Vandermonde matrix, see the earlier #15.

So what’s the punchline?

We end up with $B = 0$.

But we assumed $B$ (built like that) wasn’t $0$ —

so what’s this~~~

This is what we call a contradiction.

Meaning: the assumption itself was wrong.

So the conclusion: if $\lambda_1, \ldots, \lambda_n$ are all distinct, then diagonalization absolutely happens, no exceptions!!!

OK OK OK OK OK,

$$\lambda_1, \lambda_2, \ldots, \lambda_n$$

we just learned that if these are all different, diagonalization is automatic.

But now you object. “Hey hey hey hey — what if a couple of them coincide???” you ask.

③ What if the characteristic polynomial has repeated roots?

(Spoiler: might be diagonalizable, might not.)

$$p(\lambda) = (\lambda - \lambda_1)^n = 0$$

In this case,

$$\lambda_1$$

is the only root we get.

So you’d think we’d only fish out one eigenvector too, right???

Nope!!!!! We might be able to fish out $2$.

$$A = I, \quad (A - \lambda_1 I) = 0$$

it works out like that —

so for $x^{(1)}$, literally any vector goes!?!?!!?!! we can say,

$$x^{(1)} = \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \quad x^{(2)} = \begin{pmatrix} 0 \\ 1 \end{pmatrix}$$

we can pull in two linearly independent vectors like this.

$$\lambda_1 = 1$$

One eigenvalue doing the work of two.

So,

$$R^{-1}AR = A' = A = I$$

i.e., $R^{-1}AR = A' = A = I$.

(What is this — it’s already diagonal, so trying to diagonalize it again just spits the same thing back;;;;)

But!!!! The question is whether this always shakes out so nicely.

Gut feeling says: probably not.

$$A = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix}$$

Same characteristic polynomial as before?!?!??!?!

$$p(\lambda) = (\lambda - 1)^2 = 0, \quad \lambda = 1$$

But this time, the eigenvectors give us only one linearly independent direction…..

$$x^{(1)} = \begin{pmatrix} 1 \\ 0 \end{pmatrix}$$

Conclusion:

$$p(\lambda) = (\lambda - \lambda_1)^{m_1}(\lambda - \lambda_2)^{m_2} \cdots (\lambda - \lambda_k)^{m_k}$$

When the characteristic polynomial factors with repeated roots like this,

there are two cases lol — diagonalizable, or not~

Let’s generalize the situation above.

The earlier case was “all $n$ eigenvalues coincide”.

Instead of all $n$ collapsing, let’s let each root have its own multiplicity.

Characteristic polynomial of some matrix $A$:

$$p(\lambda) = (\lambda - \lambda_1)^{m_1}(\lambda - \lambda_2)^{m_2} \cdots (\lambda - \lambda_k)^{m_k}$$

You get that the $m_i$’s sum to $n$, right?!?!? Because $p(\lambda)$ has to be an $n$th-degree polynomial in $\lambda$!! Right?!

Here

$$m_i$$

is called the multiplicity!!!!

It just means “how many times is this root repeated~~?” and

then for the matrix to be diagonalizable, each $\lambda_i$ has to cough up at least as many linearly independent eigenvectors as its own multiplicity.

(If some eigenvalue brings in fewer than its multiplicity, another eigenvalue would have to bring in more than its own to make up the difference.

But that can’t happen. You can’t ever get more linearly independent eigenvectors out of an eigenvalue than its multiplicity. I’ll skip the proof — too long.)

$$x^{(i,1)}, x^{(i,2)}, \ldots, x^{(i,m_i)}$$

We need to gather $n$ of these $x$’s total to form an invertible $R$!!

⟨That eigenvectors from different eigenvalues are linearly independent — we already proved that above.⟩

OK! What you need to take away here:

$$\lambda_i$$

has to bring in its own multiplicity’s worth of

$$x^{(i,j)}$$

— that’s the conclusion for now.

I’m only going as far as this conclusion.

To go deeper we’d need the decomposition theorem, and that’s waaaaaay~~~~~~ later!

Don’t worry heh heh heh heh!

This conclusion gets copy-pasted forward when we hit the decomposition theorem!!!!

Yeah yeah yeah.

OK, finally — something we learned in high school…

Well, not technically in the curriculum, but every teacher told us to memorize it anyway —

let me wrap this post by proving the Cayley-Hamilton theorem using the characteristic polynomial.

$$2 \times 2: \quad A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}, \quad p(\lambda) = \lambda^2 - (a+d)\lambda + (ad-bc)$$$$p(A) = A^2 - (a+d)A + (ad-bc)I = 0$$

Whoa, that’s incredibly slick —

$$p(A) = 0$$

just falls out that easily?!!?!!

What’s even more wild is that this same thing works for $n \times n$.

Let me handle $n \times n$ case-by-case.

If $A$ is diagonalizable,

$$p(\lambda) = (\lambda - \lambda_1)(\lambda - \lambda_2) \cdots (\lambda - \lambda_n)$$

it factors into $n$ distinct linear pieces like this,

and plugging in $A$ gives

$$p(A) = (A - \lambda_1 I)(A - \lambda_2 I) \cdots (A - \lambda_n I)$$

How is this zero?!?!?!

OK OK OK OK — we said $A$ is diagonalizable, right?!??!

$$A = R \Lambda R^{-1}$$

is possible, which means

$$A - \lambda_i I = R(\Lambda - \lambda_i I)R^{-1}$$

also holds, so

$$p(A) = (A - \lambda_1 I)(A - \lambda_2 I) \cdots (A - \lambda_n I)$$

let me play with this:

$$p(A) = R(\Lambda - \lambda_1 I)R^{-1} \cdot R(\Lambda - \lambda_2 I)R^{-1} \cdots R(\Lambda - \lambda_n I)R^{-1}$$

The inner $R^{-1}R$’s collapse, so

$$p(A) = R(\Lambda - \lambda_1 I)(\Lambda - \lambda_2 I) \cdots (\Lambda - \lambda_n I)R^{-1} = 0$$$$\therefore p(A) = 0$$

Now the general case where $A$ is not diagonalizable. Once we nail this, we can close.

Different angle this time — I’ll use the adjoint.

We had this:

$$\text{adj}(A - \lambda I) \cdot (A - \lambda I) = \det(A - \lambda I) \cdot I = p(\lambda) \cdot I$$

Now,

$$\text{adj}(A - \lambda I)$$$$\text{adj}(A - \lambda I) = B_{n-1}\lambda^{n-1} + B_{n-2}\lambda^{n-2} + \cdots + B_1\lambda + B_0$$

I wrote it like this!!!!!

If adjoint is hazy, see #15.

Now compare both sides of the equality term by term, by power of $\lambda$.

The notation needs some pretty advanced typography that’s a pain to type into a math editor,

so I’ll sub in a photo.

$$\begin{aligned} \lambda^n &: \quad B_{n-1} = I \\ \lambda^{n-1} &: \quad B_{n-2} - B_{n-1}A = a_{n-1}I \\ &\vdots \\ \lambda^1 &: \quad B_0 - B_1 A = a_1 I \\ \lambda^0 &: \quad -B_0 A = a_0 I \end{aligned}$$

You see what I’m trying to write?!?!?!?!

Now add up the left sides and the right sides~!!!~!!

$$p(A) = A^n + a_{n-1}A^{n-1} + \cdots + a_1 A + a_0 I$$

And it comes out to

$$p(A) = 0$$

We’ve shown it!!!!!!!!~~~~~

This is the Cayley-Hamilton theorem!!!!!!

OK, let me give a quick teaser for the next post and bounce.

Out of the infinite polynomials in the world, is the characteristic polynomial really the only one that becomes $0$ when you sub in $A$?!?!?!?

Setting aside the characteristic polynomial specifically, it kinda feels like if we put our minds to it, we could cook up plenty more?!?!

Hmm!! So I want to define this:

The set of all polynomials that become $0$ when you plug in $A$ — collect them all —

$$I(A) = \{ p(\lambda) \mid p(A) = 0 \}$$

That’s the definition I want.

Let me push the definition just a tiny bit further.

$$F[\lambda]$$

Remember this notation???? It came up almost at the very beginning.

Anyway, naturally,

$$p(\lambda) = \det(A - \lambda I)$$

this polynomial is

$$I(A)$$

an element of this set, and

But

besides the characteristic polynomial,

$$I(A)$$

if we look for a few more elements,

we can find them like so.

If we have

$$A = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} = I$$

a matrix like this,

the characteristic polynomial is

$$p(\lambda) = (\lambda - 1)^2$$

and so

$$p(A) = (A - I)^2 = 0$$

holds.

BUT!!!!!!!!!!!!!!!!!!!!!!!!!!

$$q(A) = A - I = 0$$

doesn’t this hold too?!?!?!?!?!?!

(Learned later, apparently — that one’s called the minimal polynomial

$$m(\lambda)$$

or so I’m told.)

But why’s this a problem, you ask —

(from where I’m sitting) it’s super weird.

$$I(A)$$

this set, supposedly has

$$I(A) = p(\lambda) \cdot F[\lambda]$$

this kind of property.

Don’t you smell something brewing?!?!?!?!

We set out to study diagonalization, then ended up exploring something even more primitive than diagonalization called block form,

and we trekked all the way out here because we were going to do the decomposition theorem — the one that says “via block form, we get to diagonalization”.

To really get the decomposition theorem, we need to know the characteristic polynomial even better,

and to know the characteristic polynomial better,

$$I(A)$$

we need to know this even better,

and to understand

$$I(A)$$

better, apparently we need to understand polynomials better.

But to understand polynomials well, we need to know the field better,

and to understand the field better, apparently we need to know the integers even better………………..

So the next post is gonna be a “background knowledge for the decomposition theorem” prep session.

Originally written in Korean on my Naver blog (2016-01). Translated to English for gdpark.blog.

Comments